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Preface 


“A  odience,  level, 
and  treatment  — 
a description  of 
such  matters  is 
what  prefaces  are 
supposed  to  be 
about.” 

— P.  R.  Halmos  [142] 


"People  do  acquire 
a little  brief  author- 
ity by  equipping 
themselves  with 
jargon:  they  can 
pontificate  and  air  a 
superficial  expertise. 
But  what  we  should 
ask  of  educated 
mathematicians  is 
not  what  they  can 
speechify  about, 
nor  even  what  they 
know  about  the 
existing  corpus 
of  mathematical 
knowledge,  but 
rather  what  can 
they  now  do  with 
(heir  learning  and 
whether  they  can 
actually  solve  math- 
ematical problems 
arising  in  practice. 

In  short,  we  look  for 
deeds  not  words.” 

— J.  Hammersley  [145] 


THIS  BOOK  IS  BASED  on  a course  of  the  same  name  that  has  been  taught 
annually  at  Stanford  University  since  1970.  About  fifty  students  have  taken  it 
each  year-juniors  and  seniors,  but  mostly  graduate  students-and  alumni 
of  these  classes  have  begun  to  spawn  similar  courses  elsewhere.  Thus  Ihe  time 
seems  ripe  to  present  Ihe  material  to  a wider  audience  (including  sophomores). 

It  was  a dark  and  stormy  decade  when  Concrete  Mathematics  was  born. 
Long-held  values  were  constantly  being  questioned  during  those  turbulent 
years;  college  campuses  were  hotbeds  of  controversy.  The  college  curriculum 
itself  was  challenged,  and  mathematics  did  not  escape  scrutiny.  John  Ham- 
mersley had  just  written  a thought-provoking  article  “On  the  enfeeblement  of 
mathematical  skills  by  ‘Modern  Mathematics’  and  by  similar  soft  intellectual 
trash  in  schools  and  universities”  [145] ; other  worried  mathematicians  [272] 
even  asked,  “Can  mathematics  be  saved?”  One  of  the  present  authors  had 
embarked  on  a series  of  books  called  T he  Art  of  Computer  Programming,  and 
in  writing  the  first  volume  he  (DEK)  had  found  that  there  were  mathematical 
tools  missing  from  his  repertoire;  the  mathematics  he  needed  for  a thorough, 
well-grounded  understanding  of  computer  programs  was  quite  different  from 
what  he’d  learned  as  a mathematics  major  in  college.  So  he  introduced  a new 
course,  teaching  what  he  wished  somebody  had  taught  him. 

The  course  title  “Concrete  Mathematics”  was  originally  intended  as  an 
antidote  to  “Abstract  Mathematics,”  since  concrete  classical  results  were  rap- 
idly being  swept  out  of  the  modern  mathematical  curriculum  by  a new  wave 
of  abstract  ideas  popularly  called  the  “New  Math!’  Abstract  mathematics  is  a 
wonderful  subject,  and  there’s  nothing  wrong  with  it:  It’s  beautiful,  general, 
and  useful.  But  its  adherents  had  become  deluded  that  the  rest  of  mathemat- 
ics was  inferior  and  no  longer  worthy  of  attention.  The  goal  of  generalization 
had  become  so  fashionable  that  a generation  of  mathematicians  had  become 
unable  to  relish  beauty  in  the  particular,  to  enjoy  the  challenge  of  solving 
quantitative  problems,  or  to  appreciate  the  value  of  technique.  Abstract  math- 
ematics was  becoming  inbred  and  losing  touch  with  reality;  mathematical  ed- 
ucation needed  a concrete  counterweight  in  order  to  restore  a healthy  balance. 

When  DEK  taught  Concrete  Mathematics  at  Stanford  for  the  first  time, 
he  explained  the  somewhat  strange  title  by  saying  that  it  was  his  attempt 
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to  teach  a math  course  that  was  hard  instead  of  soft.  He  announced  that, 
contrary  to  the  expectations  of  some  of  his  colleagues,  he  was  not  going  to 
teach  the  Theory  of  Aggregates,  nor  Stone’s  Embedding  Theorem,  nor  even 
the  Stone-Tech  compactification.  (Several  students  from  the  civil  engineering 
department  got  up  and  quietly  left  the  room.) 

Although  Concrete  Mathematics  began  as  a reaction  against  other  trends, 
the  main  reasons  for  its  existence  were  positive  instead  of  negative.  And  as 
the  course  continued  its  popular  place  in  the  curriculum,  its  subject  matter 
“solidified”  and  proved  to  be  valuable  in  a variety  of  new  applications.  Mean- 
while, independent  confirmation  for  the  appropriateness  of  the  name  came 
from  another  direction,  when  Z.  A.  Melzak  published  two  volumes  entitled 
Companion  to  Concrete  Mathematics  [214], 

The  material  of  concrete  mathematics  may  seem  at  first  to  be  a disparate 
bag  of  tricks,  but  practice  makes  it  into  a disciplined  set  of  tools.  Indeed,  the 
techniques  have  an  underlying  unity  and  a strong  appeal  for  many  people. 
When  another  one  of  the  authors  (RLG)  first  taught  the  course  in  1979,  the 
students  had  such  fun  that  they  decided  to  hold  a class  reunion  a year  later. 

But  what  exactly  is  Concrete  Mathematics?  It  is  a blend  of  CONtinuotlS 
and  dis CRETE  mathematics.  More  concretely,  it  is  the  controlled  manipulation 
of  mathematical  formulas,  using  a collection  of  techniques  for  solving  prob- 
lems. Once  you,  the  reader,  have  learned  the  material  in  this  book,  all  you 
will  need  is  a cool  head,  a large  sheet  of  paper,  and  fairly  decent  handwriting 
in  order  to  evaluate  horrendous-looking  sums,  to  solve  complex  recurrence 
relations,  and  to  discover  subtle  patterns  in  data.  You  will  be  so  fluent  in 
algebraic  techniques  that  you  will  often  find  it  easier  to  obtain  exact  results 
than  to  settle  for  approximate  answers  that  are  valid  only  in  a limiting  sense. 

The  major  topics  treated  in  this  book  include  sums,  recurrences,  ele- 
mentary number  theory,  binomial  coefficients,  generating  functions,  discrete 
probability,  and  asymptotic  methods.  The  emphasis  is  on  manipulative  tech- 
nique rather  than  on  existence  theorems  or  combinatorial  reasoning;  the  goal 
is  for  each  reader  to  become  as  familiar  with  discrete  operations  (like  the 
greatest-integer  function  and  finite  summation)  as  a student  of  calculus  is 
familiar  with  continuous  operations  (like  the  absolute-value  function  and  in- 
finite integration). 

Notice  that  this  list  of  topics  is  quite  different  from  what  is  usually  taught 
nowadays  in  undergraduate  courses  entitled  “Discrete  Mathematics!’  There- 
fore the  subject  needs  a distinctive  name,  and  “Concrete  Mathematics”  has 
proved  to  be  as  suitable  as  any  other. 

The  original  textbook  fcr  Stanfod’s  course  on  concrete  mathematics  was 
the  “Mathematical  Preliminaries”  section  in  The  Art  of  Computer  Program- 
ming [173].  But  the  presentation  in  those  110  pages  is  quite  terse,  so  another 
author  (OP)  was  inspired  to  draft  a lengthy  set  of  supplementary  notes.  The 


"The  heart  of  math- 
ematics consists 
of  concrete  exam- 
ples and  concrete 
problems.  ” 

— PR,  Halmos  [141] 


"It  is  downright 
sinful  to  teach  the 
abstract  before  the 
concrete,  ” 

-Z.  A.  Melzak  [214] 


Concrete  Ma  the- 
matics  is  a bridge 
to  abstract  mathe- 
matics. 


"The  advanced 
reader  who  skips 
parts  that  appear 
too  elementary  may 
miss  more  than 
the  less  advanced 
reader  who  skips 
parts  that  appear 
too  complex,  ” 

- G . Pdlya  [238] 


(We’re  not  bold 
enough  to  try 
D istinuous  Math- 
ema  tics,) 
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“ a concrete 

life  preserver 
thrown  to  students 
sinking  in  a sea  of 
abstraction.” 

— W,  Gottschalk 


Math  graffiti: 
Kilroy  wasn’t  Haar. 
Free  the  group. 
Nuke  the  kernel. 
Power  to  the  n. 
N=1  ^ P=NP. 


I have  only  a 
marginal  interest 
in  this  subject. 


This  was  the  most 
enjoyable  course 
I’ve  ever  had.  But 
it  might  be  nice 
to  summarize  the 
material  as  you 
go  along. 


present  book  is  an  outgrowth  of  those  notes;  it  is  an  expansion  of,  and  a more 
leisurely  introduction  to,  the  material  of  Mathematical  Preliminaries.  Some  of 
the  more  advanced  parts  have  been  omitted;  on  the  other  hand,  several  topics 
not  found  there  have  been  included  here  so  that  the  story  will  be  complete. 

The  authors  have  enjoyed  putting  this  book  together  because  the  subject 
began  to  jell  and  to  take  on  a life  of  its  own  before  our  eyes;  this  book  almost 
seemed  to  write  itself.  Moreover,  the  somewhat  unconventional  approaches 
we  have  adopted  in  several  places  have  seemed  to  fit  together  so  well,  after 
these  years  of  experience,  that  we  can’t  help  feeling  that  this  book  is  a kind 
of  manifesto  about  our  favorite  way  to  do  mathematics.  So  we  think  the  book 
has  turned  out  to  be  a tale  of  mathematical  beauty  and  surprise,  and  we  hope 
that  our  readers  will  share  at  least  6 of  the  pleasure  we  had  while  writing  it. 

Since  this  book  was  born  in  a university  setting,  we  have  tried  to  capture 
the  spirit  of  a contemporary  classroom  by  adopting  an  informal  style.  Some 
people  think  that  mathematics  is  a serious  business  that  must  always  be  cold 
and  dry;  but  we  think  mathematics  is  fun,  and  we  aren’t  ashamed  to  admit 
the  fact.  Why  should  a strict  boundary  line  be  drawn  between  work  and 
play?  Concrete  mathematics  is  full  of  appealing  patterns;  the  manipulations 
are  not  always  easy,  but  the  answers  can  be  astonishingly  attractive.  The 
joys  and  sorrows  of  mathematical  work  are  reflected  explicitly  in  this  book 
because  they  are  part  of  our  lives. 

Students  always  know  better  than  their  teachers,  so  we  have  asked  the 
first  students  of  this  material  to  contribute  their  frank  opinions,  as  “graffiti" 
in  the  margins.  Some  of  these  marginal  markings  are  merely  corny,  some 
are  profound;  some  of  them  warn  about  ambiguities  or  obscurities,  others 
are  typical  comments  made  by  wise  guys  in  the  back  row;  some  are  positive, 
some  are  negative,  some  are  zero.  But  they  all  are  real  indications  of  feelings 
that  should  make  the  text  material  easier  to  assimilate.  (The  inspiration  for 
such  marginal  notes  comes  from  a student  handbook  entitled  Approaching 
Stanford,  where  the  official  university  line  is  counterbalanced  by  the  remarks 
of  outgoing  students.  For  example,  Stanford  says,  “There  are  a few  things 
you  cannot  miss  in  this  amorphous  shape  which  is  Stanford”;  the  margin 
says,  “Amorphous  . . . what  the  h***  does  that  mean?  Typical  of  the  pseudo- 
intellectualism  around  here.”  Stanford:  “There  is  no  end  to  the  potential  of 
a group  of  students  living  together.”  Graffito:  “Stanford  dorms  are  like  zoos 
without  a keeper.'1) 

The  margins  also  include  direct  quotations  from  famous  mathematicians 
of  past  generations,  giving  the  actual  words  in  which  they  announced  some 
of  their  fundamental  discoveries.  Somehow  it  seems  appropriate  to  mix  the 
words  of  Leibniz,  Euler,  Gauss,  and  others  with  those  of  the  people  who 
will  be  continuing  the  work.  Mathematics  is  an  ongoing  endeavor  for  people 
everywhere;  many  strands  are  being  woven  into  one  rich  fabric. 
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This  book  contains  more  than  500  exercises,  divided  into  six  categories: 

• Warmups  are  exercises  that  every  reader  should  try  to  do  when  first 
reading  the  material. 

• Basics  are  exercises  to  develop  facts  that  are  best  learned  by  trying 
one’s  own  derivation  rather  than  by  reading  somebody  else’s, 

• Homework  exercises  are  problems  intended  to  deepen  an  understand- 
ing of  material  in  the  current  chapter. 

• Exam  problems  typically  involve  ideas  from  two  or  more  chapters  si- 
multaneously; they  are  generally  intended  for  use  in  take-home  exams 
(not  for  in-class  exams  under  time  pressure). 

• Bonus  problems  go  beyond  what  an  average  student  of  concrete  math- 
ematics is  expected  to  handle  while  taking  a course  based  on  this  book; 
they  extend  the  text  in  interesting  ways. 

• Research  problems  may  or  may  not  be  humanly  solvable,  but  Ihe  ones 
presented  here  seem  to  be  worth  a try  (without  time  pressure). 

Answers  to  all  the  exercises  appear  in  Appendix  A,  often  with  additional  infor- 
mation about  related  results.  (Of  course,  the  “answers”  to  research  problems 
are  incomplete;  but  even  in  these  cases,  partial  results  or  hints  are  given  that 
might  prove  to  be  helpful.)  Readers  are  encouraged  to  look  at  the  answers, 
especially  the  answers  to  the  warmup  problems,  but  only  after  making  a 
serious  attempt  to  solve  the  problem  without  peeking. 

We  have  tried  in  Appendix  C to  give  proper  credit  to  the  sources  of 
each  exercise,  since  a great  deal  of  creativity  and/or  luck  often  goes  into 
the  design  of  an  instructive  problem.  Mathematicians  have  unfortunately 
developed  a tradition  of  borrowing  exercises  without  any  acknowledgment; 
we  believe  that  the  opposite  tradition,  practiced  for  example  by  books  and 
magazines  about  chess  (where  names,  dates,  and  locations  of  original  chess 
problems  are  routinely  specified)  is  far  superior.  However,  we  have  not  been 
able  to  pin  down  the  sources  of  many  problems  that  have  become  part  of  the 
folklore.  If  any  reader  knows  the  origin  of  an  exercise  for  which  our  citation 
is  missing  or  inaccurate,  we  would  be  glad  to  learn  the  details  so  that  we  can 
correct  the  omission  in  subsequent  editions  of  this  book. 

The  typeface  used  for  mathematics  throughout  this  book  is  a new  design 
by  Hermann  Zapf  [310],  commissioned  by  the  American  Mathematical  Society 
and  developed  with  the  help  of  a committee  that  included  B.  Beeton,  R.  P. 
Boas,  L.  K.  Durst,  D.  E.  Knuth,  P.  Murdock,  R S.  Palais,  P.  Renz,  E Swanson, 
S.  B.  Whidden,  and  W.  B.  Woolf.  The  underlying  philosophy  of  Zapf  s design 
is  to  capture  the  flavor  of  mathematics  as  it  might  be  written  by  a mathemati- 
cian with  excellent  handwriting.  A handwritten  rather  than  mechanical  style 
is  appropriate  because  people  generally  create  mathematics  with  pen,  pencil, 


I see: 

Concrete  mathemat- 
ics means  drilling 


The  homework  was 
tough  but  / learned 
a lot.  It  was  worth 
every  hour. 


Take-home  exams 
are  vital-keep 
them. 

Exams  were  harder 
than  the  homework 
led  me  to  exoect. 


Cheaters  may  pass 
this  course  by  just 
copying  the  an- 
swers, but  they’re 
only  cheating 
themselves. 


Difficult  exams 
don’t  take  into  ac- 
count students  who 
have  other  classes 
to  prepare  for. 
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I’m  unaccustomed 
to  this  face. 


Dear  prof:  Thanks 

for  (1)  the  puns, 

(2)  the  subject 
matter. 


/ don’t  see  how 
what  I’ve  learned 
will  ever  help  me. 


I bad  a lot  of  trou- 
ble in  this  class,  but 
/ know  it  sharpened 
my  math  skills  and 
my  thinking  skills. 


I would  advise  the 
casual  student  to 
stay  away  from  this 
course. 


or  chalk.  (For  example,  one  of  the  trademarks  of  the  new  design  is  the  symbol 
for  zero,  ‘O’,  which  is  slightly  pointed  at  the  top  because  a handwritten  zero 
rarely  closes  together  smoothly  when  the  curve  returns  to  its  starting  point.) 
The  letters  are  upright,  not  italic,  so  that  subscripts,  superscripts,  and  ac- 
cents are  more  easily  fitted  with  ordinary  symbols.  This  new  type  family  has 
been  named  AMS  Euler,  after  the  great  Swiss  mathematician  Leonhard  Euler 
(1707-1783)  who  discovered  so  much  of  mathematics  as  we  know  it  today. 
The  alphabets  include  Euler  Text  (Aa  Bb  Cc  through  Xx  Yy  Zz),  Euler  Frak- 
tur  ('Jld'Bb  Cc  through  Xy2)n  3d),  and  Euler  Script  Capitals  (A®  C through 
DC  y Z),  as  well  as  Euler  Greek  (AaB  (3  py  through  Xy H'rJ)  Gcu)  and  special 
symbols  such  as  jp  and  K.  We  are  especially  pleased  to  be  able  to  inaugurate 
the  Euler  family  of  typefaces  in  this  book,  because  Leonhard  Euler’s  spirit 
truly  lives  on  every  page:  Concrete  mathematics  is  Eulerian  mathematics. 

The  authors  are  extremely  grateful  to  Andrei  Broder,  Ernst  Mayr,  An- 
drew Yao,  and  Frances  Yao,  who  contributed  greatly  to  this  book  during  the 
years  that  they  taught  Concrete  Mathematics  at  Stanford.  Furthermore  we 
offer  1024  thanks  to  the  teaching  assistants  who  creatively  transcribed  what 
took  place  in  class  each  year  and  who  helped  to  design  the  examination  ques- 
tions; their  names  are  listed  in  Appendix  C.  This  book,  which  is  essentially 
a compendium  of  sixteen  years’  worth  of  lecture  notes,  would  have  been  im- 
possible without  their  first-rate  work. 

Many  other  people  have  helped  to  make  this  book  a reality.  For  example, 
we  wish  to  commend  the  students  at  Brown,  Columbia,  CUNY,  Princeton, 
Rice,  and  Stanford  who  contributed  the  choice  graffiti  and  helped  to  debug 
our  first  drafts.  Our  contacts  at  Addison-Wesley  were  especially  efficient 
and  helpful;  in  particular,  we  wish  to  thank  our  publisher  (Peter  Gordon), 
production  supervisor  (Bette  Aaronson),  designer  (Roy  Brown),  and  copy  ed- 
itor (Lyn  Dupre).  The  National  Science  Foundation  and  the  Office  of  Naval 
Research  have  given  invaluable  support.  Cheryl  Graham  was  tremendously 
helpful  as  we  prepared  the  index.  And  above  all,  we  wish  to  thank  our  wives 
(Fan,  Jill,  and  Amy)  for  their  patience,  support,  encouragement,  and  ideas. 

We  have  tried  to  produce  a perfect  book,  but  we  are  imperfect  authors. 
Therefore  we  solicit  help  in  correcting  any  mistakes  that  we’ve  made.  A re- 
ward of  $2.56  will  gratefully  be  paid  to  the  first  finder  of  any  error,  whether 
it  is  mathematical,  historical,  or  typographical. 

Murray  Hill,  New  Jersey  - R L G 

and  Stanford,  California  DEK 

May  1988  OP 


A Note  on  Notation 


SOME  OF  THE  SYMBOLISM  in  this  book  has  not  (yet?)  become  standard. 
Here  is  a list  of  notations  that  might  be  unfamiliar  to  readers  who  have  learned 
similar  material  from  other  books,  together  with  the  page  numbers  where 
these  notations  are  explained: 


Notation 
lnx 
lgx 
log  X 

L*J 

M 

xmody 

M 

f(x)  6x 

V f(x)6x 

*• — a 


Tlj 

5Rz 

3z 

Hn 

H'x) 

f""(z) 


Name  Page 

natural  logarithm:  log,  x 262 

binary  logarithm:  log,  x 70 

common  logarithm:  log,  0 x 435 

floor:  max{n  n <L  x,  integer  n}  67 

ceiling:  min{  n n ^ x,  integer  n}  67 

remainder:  x — y [x/yj  82 

fractional  part:  x mod  1 70 

indefinite  summation  48 

definite  summation  49 

falling  factorial  power:  x!/(x  — n) ! 47 

rising  factorial  power:  T(x  + n)/r(x)  48 

subfactorial:  n!/0!  — n!/l  ! + . . + (-1  )nn!/n!  194 

real  part:  x,  if  z = x + iy  64 

imaginary  part:  y,  if  z = x + iy  64 

harmonic  number:  l/1  + ...+  l/n  29 

generalized  harmonic  number:  1 /lx  + . . . + 1 /rtx  263 
mth  derivative  of  f at  z 456 


If  you  don't  under- 
stand what  the 
x denotes  at  the 
bottom  of  this  page, 
try  asking  your 
Latin  professor 
instead  of  your 
math  professor. 


A NOTE  ON  N 0 T AT  I 0 N xi 


Prestressed  concrete 
mathematics  is  con- 
crete mathematics 
that's  preceded  by 
a bewildering  list 
of  notations. 


Also  ‘nonstring'  is 
a string. 


(flTn  . . . Clo ) b 
Kfa!,..  . ,an) 


#A 


[zn]  f (z) 
fa..  P] 
[m  = n] 
[m\n] 
[m\\n] 
[min] 


Stirling  cycle  number  (the  “first  kind”)  245 

Stirling  subset  number  (the  “second  kind”)  244 

Eulerian  number  253 

Second-order  Eulerian  number  256 

radix  notation  for  ir=0^k  ii 

continuant  polynomial  288 

hypergeometric  function  205 

cardinality:  number  of  elements  in  the  set  A 39 

coefficient  of  zn  in  f (z)  197 

closed  interval:  the  set  {x  ct  ^ x ^ |3}  73 

1 if  1TI  = n,  otherwise  0 * 24 

1 if  m divides  n,  otherwise  0 * 102 

1 if  m exactly  divides  n,  otherwise  0 * 146 

1 if  m is  relatively  prime  to  n,  otherwise  0 * 115 


*In  general,  if  S is  any  statement  that  can  be  true  or  false,  the  bracketed 
notation  [S]  stands  for  1 if  S is  true,  0 otherwise. 

Throughout  this  text,  we  use  single-quote  marks  (‘.  . . ’)  to  delimit  text  as 
it  is  written,  double-quote  marks  (“.  . ”)  for  a phrase  as  it  is  spoken.  Thus, 
the  string  of  letters  ‘string’  is  sometimes  called  a “string!’ 

An  expression  of  the  form  ‘a/bc’  means  the  same  as  ‘a/(bc)\  Moreover, 
logx/logy  = (logx)/(logy)  and  2n!  = 2(n!). 
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Recurrent  Problems 


THIS  CHAPTER  EXPLORES  three  sample  problems  that  give  a feel  for 
what’s  to  come.  They  have  two  traits  in  common:  They’ve  all  been  investi- 
gated repeatedly  by  mathematicians;  and  their  solutions  all  use  the  idea  of 
recurrence , in  which  the  solution  to  each  problem  depends  on  the  solutions 
to  smaller  instances  of  the  same  problem. 


Raise  your  hand 
if  you’ve  never 
seen  this. 

OK,  the  rest  of 
you  can  cut  to 
equation  (1.1). 


1.1  THE  TOW  ER  OF  HANOI 

Let’s  look  first  at  a neat  little  puzzle  called  the  Tower  of  Hanoi, 
invented  by  the  French  mathematician  Edouard  Lucas  in  1883.  We  are  given 
a tower  of  eight  disks,  initially  stacked  in  decreasing  size  on  one  of  three  pegs: 


Gold  -wow. 

Are  our  disks  made 
of  concrete? 


The  objective  is  to  transfer  the  entire  tower  to  one  of  the  other  pegs,  moving 
only  one  disk  at  a time  and  never  moving  a larger  one  onto  a smaller. 

Lucas  [208]  furnished  his  toy  with  a romantic  legend  about  a much  larger 
Tower  of  Brahma,  which  supposedly  has  64  disks  of  pure  gold  resting  on  three 
diamond  needles.  At  the  beginning  of  time,  he  said,  God  placed  these  golden 
disks  on  the  first  needle  and  ordained  that  a group  of  priests  should  transfer 
them  to  the  third,  according  to  the  rules  above.  The  priests  reportedly  work 
day  and  night  at  their  task.  When  they  finish,  the  Tower  will  crumble  and 
the  world  will  end. 
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It’s  not  immediately  obvious  that  the  puzzle  has  a solution,  but  a little 
thought  (or  having  seen  the  problem  before)  convinces  us  that  it  does.  Now 
the  question  arises:  What’s  the  best  we  can  do?  That  is,  how  many  moves 
are  necessary  and  sufficient  to  perform  the  task? 

The  best  way  to  tackle  a question  like  this  is  to  generalize  it  a bit.  The 
Tower  of  Brahma  has  64  disks  and  the  Tower  of  Hanoi  has  8;  let’s  consider 
what  happens  if  there  are  tl  disks. 

One  advantage  of  this  generalization  is  that  we  can  scale  the  problem 
down  even  more.  In  fact,  we’ll  see  repeatedly  in  this  book  that  it’s  advanta- 
geous to  look  at  small  cases  first.  It’s  easy  to  see  how  to  transfer  a tower 
that  contains  only  one  or  two  disks.  And  a small  amount  of  experimentation 
shows  how  to  transfer  a tower  of  three. 

The  next  step  in  solving  the  problem  is  to  introduce  appropriate  notation: 
name  and  coNauER.  Let’s  say  that  Tn  is  the  minimum  number  of  moves 
that  will  transfer  n disks  from  one  peg  to  another  under  Lucas’s  rules.  Then 
Ti  is  obviously  1 , and  T2  = 3. 

We  can  also  get  another  piece  of  data  for  free,  by  considering  the  smallest 
case  of  all:  Clearly  To  = 0,  because  no  moves  at  all  are  needed  to  transfer  a 
tower  of  n = 0 disks!  Smart  mathematicians  are  not  ashamed  to  think  small, 
because  general  patterns  are  easier  to  perceive  when  the  extreme  cases  are 
well  understood  (even  when  they  are  trivial). 

But  now  let’s  change  our  perspective  and  try  to  think  big;  how  can  we 
transfer  a large  tower?  Experiments  with  three  disks  show  that  the  winning 
idea  is  to  transfer  the  top  two  disks  to  the  middle  peg,  then  move  the  third, 
then  bring  the  other  two  onto  it.  This  gives  us  a clue  for  transferring  rt  disks 
in  general:  We  first  transfer  the  n — 1 smallest  to  a different  peg  (requiring 
Tn_i  moves),  then  move  the  largest  (requiring  one  move),  and  finally  transfer 
the  n-  1 smallest  back  onto  the  largest  (requiring  another  Tn_i  moves).  Thus 
we  can  transfer  n disks  (for  n > 0)  in  at  most  2Tn-i  + 1 moves: 

Tn  6 2Tn-i  + 1 , for  n > 0. 

This  formula  uses  1 1 instead  of  1 = 1 because  our  construction  proves  only 

that  2Tn_i  + 1 moves  suffice;  we  haven’t  shown  that  2Tn^i  + 1 moves  are 
necessary.  A clever  person  might  be  able  to  think  of  a shortcut. 

But  is  there  a better  way?  Actually  no.  At  some  point  we  must  move  the 
largest  disk.  When  we  do,  the  n — 1 smallest  must  be  on  a single  peg,  and  it 
has  taken  at  least  Tn_i  moves  to  put  them  there.  We  might  move  the  largest 
disk  more  than  once,  if  we’re  not  too  alert.  But  after  moving  the  largest  disk 
for  the  last  time,  we  must  transfer  the  n-  1 smallest  disks  (which  must  again 
be  on  a single  peg)  back  onto  the  largest;  this  too  requires  Tn_  ] moves.  Hence 

Tn  3 2Tn„i  + 1 , for  n > 0. 


Most  of  the  pub- 
lished "solutions” 
to  Lucas’s  problem, 
like  the  early  one 
of  Allardice  and 
Fraser  [7],  tail  to  ex- 
plain why  Tn  must 
be  ^ 2Tn  | + 1. 
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Yeah,  yeah. 

I seen  that  wo  r d 
before. 


Mathematical  in- 
duction proves  that 
lit  (in  (limb  is 
high  is  we  like  on 
a ladder,  by  proving 
that  we  can  c I i mb 
onto  the  bottom 
rung  (the  basis) 
and  that  from  each 
rung  we  can  climb 
up  to  the  next  one 
(the  induction). 


These  two  inequalities,  together  with  the  trivial  solution  for  n = 0,  yield 

T°=0;  (li) 

Tn=2Tn_1+1,  for  n > 0. 

(Notice  that  these  formulas  are  consistent  with  the  known  values  T]  = 1 and 
T2  = 3.  Our  experience  with  small  cases  has  not  only  helped  us  to  discover 
a general  formula,  it  has  also  provided  a convenient  way  to  check  that  we 
haven’t  made  a foolish  error.  Such  checks  will  be  especially  valuable  when  we 
get  into  more  complicated  maneuvers  in  later  chapters.) 

A set  of  equalities  like  (l.l)  is  called  a recurrence  (a.k.a.  recurrence 
relation  or  recursion  relation).  It  gives  a boundary  value  and  an  equation  for 
the  general  value  in  terms  of  earlier  ones.  Sometimes  we  refer  to  the  general 
equation  alone  as  a recurrence,  although  technically  it  needs  a boundary  value 
to  be  complete. 

The  recurrence  allows  us  to  compute  Tn  for  any  n we  like.  But  nobody 
really  likes  to  compute  from  a recurrence,  when  n is  large;  it  takes  too  long. 
The  recurrence  only  gives  indirect,  “local”  information.  A solution  to  the 
recurrence  would  make  us  much  happier.  That  is,  we’d  like  a nice,  neat, 
“closed  form”  for  Tn  that  lets  us  compute  it  quickly,  even  for  large  n.  With 
a closed  form,  we  can  understand  what  Tn  really  is. 

So  how  do  we  solve  a recurrence?  One  way  is  to  guess  the  correct  solution, 
then  to  prove  that  our  guess  is  correct.  And  our  best  hope  for  guessing 
the  solution  is  to  look  (again)  at  small  cases.  So  we  compute,  successively, 
T3  =2-3  + 1 =7;  T4  =2-7  + 1 = 15;  T5  =2-15  + 1 =31;  T6  =2-31  +1  =63. 
Aha!  It  certainly  looks  as  if 

Tn  — 2n  — 1 , forn  £0.  (1.2) 

At  least  this  works  for  n ^ 6. 

Mathematical  induction  is  a general  way  to  prove  that  some  statement 
about  the  integer  n is  true  for  all  n ^ no.  First  we  prove  the  statement 
when  n has  its  smallest  value,  no;  this  is  called  the  basis.  Then  we  prove  the 
statement  for  n > no,  assuming  that  it  has  already  been  proved  for  all  values 
between  rtc  and  n — 1 , inclusive;  this  is  called  the  induction.  Such  a proof 
gives  infinitely  many  results  with  only  a finite  amount  of  work. 

Recurrences  are  ideally  set  up  for  mathematical  induction.  In  our  case, 
for  example,  11.21  follows  easily  from  (1.1):  The  basis  is  trivial,  since  To  = 
2C  — 1 =0.  And  the  induction  follows  for  n > 0 if  we  assume  that  11.21  holds 
when  n is  replaced  by  n — 1 : 

Tn  = 2Tn  , + 1 = 2(2"  1 - 1 ) + 1 = 2n  - 1 . 

Hence  ( 1. 21  holds  for  n as  well.  Good!  Our  quest  for  Tn  has  ended  successfully. 
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Of  course  the  priests’  task  hasn’t  ended;  they’re  still  dutifully  moving 
disks,  and  will  be  for  a while,  because  for  n = 64  there  are  264  — 1 moves  (about 
18  quintillion).  Even  at  the  impossible  rate  of  one  move  per  microsecond,  they 
will  need  more  than  5000  centuries  to  transfer  the  Tower  of  Brahma.  Lucas’s 
original  puzzle  is  a bit  more  practical,  It  requires  28  — 1 = 255  moves,  which 
takes  about  four  minutes  for  the  quick  of  hand. 

The  Tower  of  Hanoi  recurrence  is  typical  of  many  that  arise  in  applica- 
tions of  all  kinds.  In  finding  a closed-form  expression  for  some  quantity  of 
interest  like  Tn  we  go  through  three  stages: 

1 Look  at  small  cases.  This  gives  us  insight  into  the  problem  and  helps  us 
in  stages  2 and  3. 

2 Find  and  prove  a mathematical  expression  for  the  quantity  of  interest. 
For  the  Tower  of  Hanoi,  this  is  the  recurrence  (1.1)  that  allows  us,  given 
the  inclination,  to  compute  Tn  for  any  n. 

3 Find  and  prove  a closed  form  for  our  mathematical  expression.  For  the 
Tower  of  Hanoi,  this  is  the  recurrence  solution  (1.2). 

The  third  stage  is  the  one  we  will  concentrate  on  throughout  this  book.  In 
fact,  we’ll  frequently  skip  stages  1 and  2 entirely,  because  a mathematical 
expression  will  be  given  to  us  as  a starting  point.  But  even  then,  we’ll  be 
getting  into  subproblems  whose  solutions  will  take  us  through  all  three  stages. 

Our  analysis  of  the  Tower  of  Hanoi  led  to  the  correct  answer,  but  it 
required  an  “inductive  leap”;  we  relied  on  a lucky  guess  about  the  answer. 
One  of  the  main  objectives  of  this  book  is  to  explain  how  a person  can  solve 
recurrences  without  being  clairvoyant.  For  example,  we’ll  see  that  recurrence 
(1.1)  can  be  simplified  by  adding  1 to  both  sides  of  the  equations: 


What  is  a proof? 
“One  half  of  one 
percent  pure  alco- 
hol. ” 


To  + 1 - 1; 

Tn  + 1 = 2Tn_i  + 2 , for  n > 0. 
Now  if  we  let  Un  = Tn  + 1 , we  have 

U0  = 1 ; 

lin  = 2lin_i  , for  n > 0. 


Interesting:  We  get 
rid  of  the  +1in 
(1.1)  by  adding,  not 
(1.3)  by  subtracting. 


It  doesn’t  take  genius  to  discover  that  the  solution  to  this  recurrence  is  just 
Un  = 2n;  hence  Tn  = 2”  — 1.  Even  a computer  could  discover  this. 


1.2  LINES  IN  THE  PLANE 

Our  second  sample  problem  has  a mote  geometric  flavor  How  many 
slices  of  pizza  can  a person  obtain  by  making  n straight  cuts  with  a pizza 
knife?  Or,  more  academically:  What  is  the  maximum  number  Ln  of  regions 
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defined  by  n lines  in  the  plane?  This  problem  was  first  solved  in  1826,  by  the 
(A  pizza  with  Swiss  Swiss  mathematician  Jacob  Steiner  [278]. 

c^eese-)  Again  we  start  by  looking  at  small  cases,  remembering  to  begin  with  the 

smallest  of  all.  The  plane  with  no  lines  has  one  region;  with  one  line  it  has 
two  regions;  and  with  two  lines  it  has  four  regions: 


1 

Lo  = 1 


A region  is  convex 
if  it  includes  all 
line  segments  be- 
tween any  two  of  its 
points.  (That’s  not 
what  my  dictionary 
says,  but  it’s  what 
mathematicians 
believe.) 


(Each  line  extends  infinitely  in  both  directions.) 

Sure,  we  think,  Ln  = 2”;  of  course!  Adding  a new  line  simply  doubles 
the  number  of  regions.  Unfortunately  this  is  wrong.  We  could  achieve  the 
doubling  if  the  nth  line  would  split  each  old  region  in  two;  certainly  it  can 
split  an  old  region  in  at  most  two  pieces,  since  each  old  region  is  convex.  (A 
straight  line  can  split  a convex  region  into  at  most  two  new  regions,  which 
will  also  be  convex.)  But  when  we  add  the  third  line-the  thick  one  in  the 
diagram  below-  we  soon  find  that  it  can  split  at  most  three  of  the  old  regions, 
no  matter  how  we’ve  placed  the  first  two  lines: 


Thus  L3  = 4 + 3 = 7 is  the  best  we  can  do. 

And  after  some  thought  we  realize  the  appropriate  generalization.  The 
nth  line  (for  n > 0)  increases  the  number  of  regions  by  k if  and  only  if  it 
splits  k of  the  old  regions,  and  it  splits  k old  regions  if  and  only  if  it  hits  the 
previous  lines  in  k-  1 different  places.  Two  lines  can  intersect  in  at  most  one 
point.  Therefore  the  new  line  can  intersect  the  n-  1 old  lines  in  at  most  n-  1 
different  points,  and  we  must  have  k ;C  n.  We  have  established  the  upper 
bound 


Ln  $ U-i  +n,  for  n > 0. 

Furthermore  it’s  easy  to  show  by  induction  that  we  can  achieve  equality  in 
this  formula.  We  simply  place  the  nth  line  in  such  a way  that  it’s  not  parallel 
to  any  of  the  others  (hence  it  intersects  them  all),  and  such  that  it  doesn’t  go 
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through  any  of  the  existing  intersection  points  (hence  it  intersects  them  all 
in  different  places).  The  recurrence  is  therefore 


Ln  = Ln_1+n,  for  n > 0.  v ' 

The  known  values  of  Li , L2,  and  L3  check  perfectly  here,  so  we’ll  buy  this. 

Now  we  need  a closed-form  solution.  We  could  play  the  guessing  game 
again,  but  1, 2,  4,  7,  11,  16,  . . . doesn’t  look  familiar;  so  let’s  try  another 
tack.  We  can  often  understand  a recurrence  by  “unfolding”  or  “unwinding” 
it  all  the  way  to  the  end,  as  follows: 


Ln  = Ln-i  + n 

= hn-2  + (Tl  — 1 ) + Tl 
= Ln_3  + (n  - 2)  + (n  - 1)  + n 


Unfolding? 

I’d  call  this 
"plugging  in." 


- Lo  + 1 + 2 + '"  + (n--"2)  + (n  — l)  + n 
= 1 + Sn  , where  Sn  =1+2  + 3 + . . + (n  — l)  + n. 

In  other  words,  Ln  is  one  more  than  the  sum  Sn  of  the  first  n positive  integers. 

The  quantity  Sn  pops  up  now  and  again,  so  it’s  worth  making  a table  of 
small  values.  Then  we  might  recognize  such  numbers  more  easily  when  we 
see  them  the  next  time: 


n 

12  3 4 5 6 

7 8 

9 10 

11 

12 

13 

14 

Sn 

i 3 6 10  15  21 

2 8 3 6 

4 5 5 5 

66 

78 

91 

105 

These  values  are  also  called  the  triangular  numbers,  because  Sn  is  the  number 
of  bowling  pins  in  an  n-row  triangular  array.  For  example,  the  usual  four-row 
array  vX  has  S4  = 10  pins. 

To  evaluate  Sn  we  can  use  a trick  that  Gauss  reportedly  came  up  with 
in  1786,  when  he  was  nine  years  old  [73]  (see  also  Euler  [92,  part  1,  §415]): 

Sn=  1 + 2 + 3 +...+  (n-1)  + n 

+ Sn  = n + (n-1)  + (n-2)  -j-  ••<  + 2 + 1 

2Sn  = (n  + l)+  (n  + l)+  (n+  1)  + • ■ • + (n  + 1)  + (n+1) 

We  merely  add  Sn  to  its  reversal,  so  that  each  of  the  n columns  on  the  right 
sums  to  n + 1,  Simplifying, 


It  seems  a lot  of 
stuff  is  attributed 
to  Gauss- 
either  he  was  really 
smart  or  he  had  a 
great  press  agent. 

Maybe  he  just 
had  a magnetic 
personality. 


n(n+  1) 


(15) 


Sn 


2 


) 


for  n > 0. 
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Actually  Gauss  is 
Often  called  the 
greatest  mathe- 
matician of  all  time. 
So  it’s  nice  to  be 
able  to  understand 
at  least  one  of  his 
discoveries. 


When  in  doubt, 
look  at  the  words. 
Why  is  it  “closed," 
as  opposed  to 
“open"?  What 
image  does  it  bring 
to  mind? 

Answer:  The  equa- 
tion is  "closed. not 
defined  in  terms  of 
itself-not  leading 
to  recurrence.  The 
case  is  "dosed'1  -it 
won’t  happen  again. 
Metaphors  are  the 
key. 


Is  “zig”  a technical 
term? 


OK,  we  have  our  solution: 

r n(n+1)  , . 

Ln  = -L~2 — L+1.  forn^O.  (1.6) 

As  experts,  we  might  be  satisfied  with  this  derivation  and  consider  it 
a proof,  even  though  we  waved  our  hands  a bit  when  doing  the  unfolding 
and  reflecting.  But  students  of  mathematics  should  be  able  to  meet  stricter 
standards;  so  it’s  a good  idea  to  construct  a rigorous  proof  by  induction.  The 
key  induction  step  is 

Ln  = Ln_i  + tt  = ( j(tv  — 1 )n  + 1)  + n = jtv(ti  + 1 ) + 1 . 

Now  there  can  be  no  doubt  about  the  closed  form  (1.6). 

Incidentally  we’ve  been  talking  about  “closed  forms”  without  explic- 
itly saying  what  we  mean.  Usually  it’s  pretty  clear.  Recurrences  like  (1.1) 
and  (1.4)  are  not  in  closed  form-  they  express  a quantity  in  terms  of  itself; 
but  solutions  like  (1.2)  and  (1.6)  are.  Sums  like  1 + 2 + . . . + n are  not  in 
closed  form-  they  cheat  by  using  but  expressions  like  n(n  + 1 )/2  are. 

We  could  give  a rough  definition  like  this:  An  expression  for  a quantity  f(n) 
is  in  closed  form  if  we  can  compute  it  using  at  most  a fixed  number  of  “well 
known”  standard  operations,  independent  of  n.  For  example,  2n  — 1 and 
n(n  + l)/2  are  closed  forms  because  they  involve  only  addition,  subtraction, 
multiplication,  division,  and  exponentiation,  in  explicit  ways. 

The  total  number  of  simple  closed  forms  is  limited,  and  there  are  recur- 
rences that  don’t  have  simple  closed  forms.  When  such  recurrences  turn  out 
to  be  important,  because  they  arise  repeatedly,  we  add  new  operations  to  our 
repertoire;  this  can  greatly  extend  the  range  of  problems  solvable  in  “simple” 
closed  form.  For  example,  the  product  of  the  first  n integers,  n!,  has  proved 
to  be  so  important  that  we  now  consider  it  a basic  operation.  The  formula 
‘n! ’ is  therefore  in  closed  form,  although  its  equivalent  T -2- , . . -n'  is  not. 

And  now,  briefly,  a variation  of  the  lines-in-the-plane  problem:  Suppose 
that  instead  of  straight  lines  we  use  bent  lines,  each  containing  one  “zig!’ 
What  is  the  maximum  number  Zn  of  regions  determined  by  n such  bent  lines 
in  the  plane?  We  might  expect  Zn  to  be  about  twice  as  big  as  Ln,  or  maybe 
three  times  as  big.  Let’s  see: 
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From  these  small  cases,  and  after  a little  thought,  we  realize  that  a bent 
line  is  like  two  straight  lines  except  that  regions  merge  when  the  “two”  lines 
don’t  extend  past  their  intersection  point. 


3 


Regions  2,  3,  and  4,  which  would  be  distinct  with  two  lines,  become  a single 
region  when  there’s  a bent  line;  we  lose  two  regions.  However,  if  we  arrange 
things  properly-the  zig  point  must  lie  “beyond”  the  intersections  with  the 
other  lines-that’s  all  we  lose;  that  is,  we  lose  only  two  regions  per  line.  Thus 

Zn  = L2n  - 2n  = 2n(2n  + 1)/2  + 1 — 2n 

= 2n2-n  + 1,  forrt^O.  (1.7) 

Comparing  the  closed  forms  (1.6)  and  (1.7),  we  find  that  for  large  n, 

Ln  ~ \n2  , 

Zn  ~ 2n2 ; 

so  we  get  about  four  times  as  many  regions  with  bent  lines  as  with  straight 
lines.  (In  later  chapters  we’ll  be  discussing  how  to  analyze  the  approximate 
behavior  of  integer  functions  when  n is  large.) 


1.3  THE  JOSEPHUS  PROBLEM 

Our  final  introductory  example  is  a variant  of  an  ancient  problem 
named  for  Flavius  Josephus,  a famous  historian  of  the  first  century.  Legend 
has  it  that  Josephus  wouldn’t  have  lived  to  become  famous  without  his  math- 
ematical talents.  During  the  Jewish-Roman  war,  he  was  among  a band  of  41 
Jewish  rebels  trapped  in  a cave  by  the  Romans.  Preferring  suicide  to  capture, 
the  rebels  decided  to  form  a circle  and,  proceeding  around  it,  to  kill  every 
third  remaining  person  until  no  one  was  left.  But  Josephus,  along  with  an 
unindicted  co-coaspirator,  wanted  none  of  this  suicide  nonsense;  so  he  quickly 
calculated  where  he  and  his  friend  should  stand  in  the  vicious  circle. 

In  our  variation,  we  start  with  tl  people  numbered  1 to  n around  a circle, 
and  we  eliminate  every  second  remaining  person  until  only  one  survives.  For 


. . and  a little 
afterthought. . . 


Exercise  18  has  the 
details. 


(Ahrens  [5,  vol.  2) 
and  Herstein 
and  Kaplansky  [156j 
discuss  the  interest- 
ing history  of  this 
problem.  Josephus 
himself  [166]  is  a bit 
vague.) 


, thereby  saving 
his  tale  for  us  to 
hear. 
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Here's  a case  where 
n = 0 makes  no 

sense. 


Even  so,  a bad 
guess  isn’t  a waste 
of  time,  because  it 
gets  us  involved  in 
the  problem. 


This  is  the  tricky 
part:  We  have 

J(2n)  = 

new/iumber  (J(n)), 
where 

newnumberf  k)  = 
2k- 1. 


example,  here’s  the  starting  configuration  for  Tl  = 10: 


The  elimination  order  is  2,  4,  6,  8,  10,  3,  7,  1,  9,  so  5 survives.  The  problem: 
Determine  the  survivor’s  number,  J(n). 

We  just  saw  that  1(10)  = 5.  We  might  conjecture  that  J(n)  = n/2  when 
Tl  is  even;  and  the  case  n = 2 supports  the  conjecture:  J(2)  = 1.  But  a few 
other  small  cases  dissuade  us-the  conjecture  fails  for  n = 4 and  n = 6. 


Tl 

1 2 3 4 5 6 

J(n) 

113  13  5 

It’s  back  to  the  drawing  board;  let’s  try  to  make  a better  guess.  Hmmm  . . . 
J(n)  always  seems  to  be  odd.  And  in  fact,  there’s  a good  reason  for  this:  The 
first  trip  around  the  circle  eliminates  all  the  even  numbers.  Furthermore,  if 
n itself  is  an  even  number,  we  arrive  at  a situation  similar  to  what  we  began 
with,  except  that  there  are  only  half  as  many  people,  and  their  numbers  have 
changed. 

So  let’s  suppose  that  we  have  2n  people  originally.  After  the  first  go- 
round,  we’re  left  with 


2rt  — 1 

2n  — 3 / 


5 

7 


and  3 will  be  the  next  to  go.  This  is  just  like  starting  out  with  n people,  except 
that  each  person’s  number  has  been  doubled  and  decreased  by  1.  That  is, 

J(2n)  = 2 J (n)  — 1 , for  n ;>  1 

We  can  now  go  quickly  to  large  n.  For  example,  we  know  that  J(  10)  = 5,  so 
J (20)  = 2J(10)  — 1 = 2-5-  1 = 9 
Similarly  J (40)  = 17,  and  we  can  deduce  that  J(5-2m)  = 2m+1  T 1 
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But  what  about  the  odd  case?  With  2n  + 1 people,  it  turns  out  that  Odd  case?  Hey, 

person  number  1 is  wiped  out  just  after  person  number  2n,  and  we’re  left  with  leave  my  brother 

out  of  it. 


Again  we  almost  have  the  original  situation  with  n people,  but  this  time  their 
numbers  are  doubled  and  increased  by  1.  Thus 

J(2n+  l)  = 2J(n)  + 1 , for  n ^ 1. 

Combining  these  equations  with  J(  1)  = 1 gives  us  a recurrence  that  defines  J 
in  all  cases: 

F(l)=  1; 

J(2n)  = 2J(n)  - 1 for  n ^ 1;  (i-8) 

J(2n  + 1)  = 2J(n)  + 1 for  n ^ 1. 

Instead  of  getting  J(n)  from  J(n- 1)  this  recurrence  is  much  more  “efficient,” 
because  it  reduces  n by  a factor  of  2 or  more  each  time  it’s  applied.  We  could 
compute  J(  1000000),  say,  with  only  19  applications  of  (1.8).  But  still,  we  seek 
a closed  form,  because  that  will  be  even  quicker  and  more  informative.  After 
all,  this  is  a matter  of  life  or  death. 

Our  recurrence  makes  it  possible  to  build  a table  of  small  values  very 
quickly.  Perhaps  we’ll  be  able  to  spot  a pattern  and  guess  the  answer. 


n 

1 

2 3 4 

5 6 7 8 

9 10  11  12  13  14  15  ] 

6 

J(n)  1 

1 

3 1 

3 5 7 

1 3 5 7 9 11  13  15 

1 

Voila!  It  seems  we  can  group  by  powers  of  2 (marked  by  vertical  lines  in 
the  table);  J(n)is  always  1 at  the  beginning  of  a group  and  it  increases  by  2 
within  a group.  So  if  we  write  n in  the  form  n = 2”  +1,  where  2m  is  the 
largest  power  of  2 not  exceeding  n and  where  l is  what’s  left,  the  solution  to 
our  recurrence  seems  to  be 

J(2m  + l)  =21+1  for  m ^ 0 and  0 l < 2m,  (l-9) 

(Notice  that  if  2”  ^ n < 2m+1,  the  remainder  l = n — 2”  satisfies  0 <jl  < 

2m+i  — 2m  - 2m  ) 

We  must  now  prove  (l.g),  As  in  the  past  we  use  induction,  but  this  time 
the  induction  is  on  m.  When  m = 0 we  must  have  1 = 0;  thus  the  basis  of 
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But  there's  a sim- 
pler way!  The 
key  tact  is  that 
J(2m)  = 1 for 
all  m,  and  this 
follows  immedi- 
ately tinm  our  first 
equation, 

J(2n)=  2J(n)  — 1 . 
Hence  we  know  that 
the  first  person  will 
survive  whenever 
n isapowerof2. 
And  in  the  gen- 
eral case,  when 
n = 2m  + l, 
the  number  of 
people  is  reduced 
to  a power  of  2 
after  there  have 
been  1 executions. 
The  first  remaining 
person  at  this  point, 
the  survivor,  is 
number  21  + 1 . 


(1.9)  reduces  to  J(l)  = 1,  which  is  true.  The  induction  step  has  two  parts, 
depending  on  whether  l is  even  or  odd.  If  m > 0 and  2m  + 1 = 2 n,  then  l is 
even  and 

J(2m  + l)  = 2J(2m"-1  + 1/2)  - 1 = 2(21/2  + l)  - 1 = 21  + 1 , 

by  (1.8)  and  the  induction  hypothesis;  this  is  exactly  what  we  want.  A similar 
proof  works  in  the  odd  case,  when  2”  + l = 2n  + 1.  We  might  also  note  that 
(1.8)  implies  the  relation 

J(2u+  1)  - J(2n)  = 2. 

Either  way,  the  induction  is  complete  and  (1.9)  is  established. 

To  illustrate  solution  (1.9),  let’s  compute  J(  100).  In  this  case  we  have 
100  = f + 36,  so  J(100)  = 2.36  + 1 = 73. 

Now  that  we’ve  done  the  hard  stuff  (solved  the  problem)  we  seek  the 
soft:  Every  solution  to  a problem  can  be  generalized  so  that  it  applies  to  a 
wider  class  of  problems.  Once  we’ve  learned  a technique,  it’s  instructive  to 
look  at  it  closely  and  see  how  far  we  can  go  with  it.  Hence,  for  the  rest  of  this 
section,  we  will  examine  the  solution  (1.9)  and  explore  some  generalizations 
of  the  recurrence  (1.8).  These  explorations  will  uncover  the  structure  that 
underlies  all  such  problems. 

Powers  of  2 played  an  important  role  in  our  finding  the  solution,  so  it’s 
natural  to  look  at  the  radix  2 representations  of  n and  J(n).  Suppose  n’s 
binary  expansion  is 

n = (bm  bm_i  . . bi  boh  ! 

that  is, 

n = bm2m  + bm_i2m_1  + 1 • ■ + b]2  + bo  , 

where  each  is  either  0 or  1 and  where  the  leading  bit  bm  is  1.  Recalling 
that  n = 2”  + 1,  we  have,  successively, 

n = (1  bm_i  bm-2...bi  b0)2, 
l = (0  bm_i  bm_2  • b;  b0h  . 

21  = (bm  1 bm_2  ...  bi  b0  0)2  , 

21  + 1 = (bm_i  bm_2 . • . bi  b0 1 h > 

J(n)  = (bm_i  bm_2 . . .b;  b0  bm)2. 

(The  last  step  follows  because  J(n)  =21+1  and  because  bm  = 1.)  We  have 
proved  that 


J ((bm  bm-i  . . . b;  boh)  — (bm— 1 ...bi  bobmh; 


(1.10) 
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that  is,  in  the  lingo  of  computer  programming,  we  get  J(n)  from  n by  doing 
a one-bit  cyclic  shift  left!  Magic.  For  example,  if  n = 100  = ( 1 100100)2  then 
J(n)  = J((1  100100)2)  = ( 1001001)  2,  which  is  64  + 8 + 1 = 73.  If  we  had  been 
working  all  along  in  binary  notation,  we  probably  would  have  spotted  this 
pattern  immediately. 

If  we  start  with  n and  iterate  the  J function  m -(■  1 times,  we’re  doing  (“Iteration”  means 

m + 1 one-bit  cyclic  shifts;  so,  since  n is  an  (m+1  )-bit  number,  we  might  applying  a function 

expect  to  end  up  with  n again.  But  this  doesn’t  quite  work.  For  instance  t0  'tse^* 

if  n ~ 13  we  have  j((1101)2)=  (1011)2,  but  then  J (( 1 01 1)2)=  (111)2  and 
the  process  breaks  down;  the  0 disappears  when  it  becomes  the  leading  bit. 

In  fact,  J(n)  must  always  be  ^ n by  definition,  since  J(n)  is  the  survivor’s 
number;  hence  if  J(n)  < n we  can  never  get  back  up  to  n by  continuing  to 
iterate. 

Repeated  application  of  J produces  a sequence  of  decreasing  values  that 
eventually  reach  a “fixed  point,”  where  J(n)  = n.  The  cyclic  shift  property 
makes  it  easy  to  see  what  that  fixed  point  will  be:  Iterating  the  function 
enough  times  will  always  produce  a pattern  of  all  I ' s whose  value  is  2"^n'  — 1 , 
where  •v(n)  is  the  number  of  1 bits  in  the  binary  representation  of  n.  Thus, 
since  "v(  13)  = 3,  we  have 


2 or  more  J's 

JuC~~?(13)...))  = 23-l  = 7; 


similarly 

8 or  more 

J(iCj((101101101101011)2)...))  = 210  - 1 = 1023. 

Curious,  but  true. 

Let’s  return  briefly  to  our  first  guess,  that  J(n)  = n/2  when  n is  even. 
This  is  obviously  not  true  in  general,  but  we  can  now  determine  exactly  when 
it  is  true: 


Curiously  enough, 
if  M is  a compact 
C°°  n-manifold 
(n  > 1 ) , there 
exists  a differen- 
tiable immersion  of 
M into  RZn  y[n] 
but  not  necessarily 
into  R2n  v(nM. 

I wonder  if  Jose- 
phus was  secretly 
a topologist? 


J(n)  = n/2, 

21+1  = (2m  + l)/2, 

1 = 2(2m  - 2) . 

If  this  number  l = | (2’”  2)  is  an  integer,  then  n = 2”  + l will  be  a solution, 

because  l will  be  less  than  2m.  It’s  not  hard  to  verify  that  2m  -2  is  a multiple 
of  3 when  m is  odd,  but  not  when  m is  even.  (We  will  study  such  things 
in  Chapter  4.)  Therefore  there  are  infinitely  many  solutions  to  the  equation 
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Looks  like  Greek 
to  me. 


J(n)  = nil,  beginning  as  follows: 


m l n - 2m  + 1 J(n)  = 21  + 1 = n/2  n (binary) 


0 2 
3 2 10 

5 10  42 

7 42  170 


10 

5 1010 

21  101010 

85  10101010 


Notice  the  pattern  in  the  rightmost  column.  These  are  the  binary  numbers 
for  which  cyclic-shifting  one  place  left  produces  the  same  result  as  ordinary- 
shifting  one  place  right  (halving). 

OK,  we  understand  the  J function  pretty  well;  the  next  step  is  to  general- 
ize it.  What  would  have  happened  if  our  problem  had  produced  a recurrence 
that  was  something  like  (i.8),  but  with  different  constants?  Then  we  might 
not  have  been  lucky  enough  to  guess  the  solution,  because  the  solution  might 
have  been  really  weird.  Let’s  investigate’ this  by  introducing  constants  a,  (3, 
and  y and  trying  to  find  a closed  form  for  the  more  general  recurrence 


f(i)  = a; 

f (2n)  = 2f (n)  + (3  , for  n ^ 1;  a.  id 

f(2n  + 1)  = 2f(n)  +y , for  n ^ 1. 

(Our  original  recurrence  had  a = 1,  (3  = -1,  and  y = 1.)  Starting  with 
f (1)  = a and  working  our  way  up,  we  can  construct  the  following  general 
table  for  small  values  of  n: 


rt 

f(n) 

1 

a 

2 

la  + (3 

31 

2a  + y 

4 

4a  + 3(3 

5 

4 a + 2(3  + y 

6 

4a  + (3  + 2y 

7 

4a  + 3y 

8" 

8a  + 7(3 

9 

8a  + 6(3  + y 

It  seems  that  a 's coefficient  is  n’s  largest  power  of  2.  Furthermore,  between 
powers  of  2,  (3’s  coefficient  decreases  by  1 down  to  0 and  y’s  increases  by  1 
up  from  0.  Therefore  if  we  express  f(n)  in  the  form 

f(n)  = A(n)  a + B(n)  (3  + C(n)y  , (1-13) 
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by  separating  out  its  dependence  on  a,  (3,  and  y,  it  seems  that 
A(n)  = 2m ; 

B(n)  = 2m  — 1 — l ; (1.14) 

C ( n ) = l, 

Here,  as  usual,  n = 2m  + l and  0 ^ l < 2m,  for  n ^ 1 , 

It’s  not  terribly  hard  to  prove  (1.13)  and  (1.14)  by  induction,  but  the 
calculations  are  messy  and  uninformative.  Fortunately  there’s  a better  way 
to  proceed,  by  choosing  particular  values  and  then  combining  them.  Let’s 
illustrate  this  by  considering  the  special  case  a = 1 , (3  = y = 0,  when  f(n)  is 
supposed  to  be  equal  to  A(n):  Recurrence  (1.11)  becomes 

A(l)  = 1; 

A(2n)  = 2A('n) , for  rt  ^ 1; 

A(2n  + 1)  = 2A(n) , for  n,  1. 

Sure  enough,  it’s  true  (by  induction  on  m)  that  A(2m  + l)  = 2m. 

Next,  let’s  use  recurrence  (1.11)  and  solution  (1.13)  in  reverse , by  start- 
ing with  a simple  function  f(n)  and  seeing  if  there  are  any  constants  (a,  (3,  y) 
that  will  define  it.  Plugging  in  the  constant  function  f(n)  = 1 says  that 

1 = a; 

1 = 2-l  + (3; 

1 = 2-1 +y; 

hence  the  values  (a,  (3,  y)  = (1,  ■ 1,  -1)  satisfying  these  equations  will  yield 
A(n)  — B(n)  C(n)  = f(n)  = 1 . Similarly,  we  can  plug  in  f(n)  = n: 

1 = a; 

2n  = 2-n+  (3; 

2n  + 1 = 2-n  + y; 

These  equations  hold  for  all  tl  when  a = 1,  (3  = 0,  and  y = 1,  so  we  don’t 
need  to  prove  by  induction  that  these  parameters  will  yield  f(n)  = n.  We 
already  know  that  f(n)  = rt  will  be  the  solution  in  such  a case,  because  the 
recurrence  (1.11)  uniquely  defines  f(n)  for  every  value  of  n, 

And  now  we’re  essentially  done!  We  have  shown  that  the  functions  A(n), 
B(n),  and  C(n)  of  (1.13),  which  solve  (1.11)  in  general,  satisfy  the  equations 

A(n)  = 2m  , where  rt  = 2”  + l and  0 ^ l < 2”; 

A(n)  -B(n)  - C(n)  = 1 ; 

A(n)  + C(n)  = n. 


Ho/d  onto  your 
hats,  this  next  part 
is  new  stuff. 


A neat  idea! 
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Our  conjectures  in  (1.14)  follow  immediately,  since  we  can  solve  these  equa- 
tions to  get  C(n)  = n — A(n)  = l and  B(n)  = A(n)  — 1 — C(n)  = 2”  — 1 — l. 

This  approach  illustrates  a surprisingly  useful  repertoire  method  for  solv- 
ing recurrences.  First  we  find  settings  of  general  parameters  for  which  we 
know  the  solution;  this  gives  us  a repertoire  of  special  cases  that  we  can  solve. 
Then  we  obtain  the  general  case  by  combining  the  special  cases.  We  need  as 
many  independent  special  solutions  as  there  are  independent  parameters  (in 
this  case  three,  for  a,  (3,  and  y).  Exercises  16  and  20  provide  further  examples 
of  the  repertoire  approach. 

We  know  that  the  original  J-recurrence  has  a magical  solution,  in  binary: 

J((bmbm-1...  bi  b0)2)=  (bm_t.  . . bib0bm)2,  where  bm=  1. 

Does  the  generalized  Josephus  recurrence  admit  of  such  magic? 

Sure,  why  not?  We  can  rewrite  the  generalized  recurrence  (1.11)  as 

f(l)  = a; 

f(2n  + j)  = 2f(n)  + |3j  , for  j = 0, 1 and  n ^ 1 , 

if  we  let  |3o  = (3  and  |3]=  y.  And  this  recurrence  unfolds,  binary-wise: 

f ((bm  bm_i . . . b]  boh)  = 2f({bm  bm_, ...  bi  ]2)  + Pbc 

= 4f((bm  bm_!  . ..  b2)2)+  2|3b|  + |3bo 

= 2mf((bm)2)  + 2m_1  (3bm_,  + ■ ■ ■ + 2|3b|  + (3bc 

= 2ma  + 2m”1(3bm_1  + ...  + 2(3bl  + |3bc  . 

('relax  = 'destroy')  Suppose  we  now  relax  the  radix  2 notation  to  allow  arbitrary  digits  instead 
of  just  0 and  j,  The  derivation  above  tells  us  that 

"f ((bm  bm— i . . bi  bo)2)=  (a  (3bm_,  |3bm_2 ...  (3b,  |3bo  )2 . (b!6) 

Nice.  We  would  have  seen  this  pattern  earlier  if  we  had  written  (1.12)  in 
anot  her  way: 


I think  I get  it: 

The  binary  repre- 
sentations of  A(n), 
B(n),  and  C(n) 
have  1 's  in  different 
positions. 


Beware:  The  au- 
thors are  expecting 
us  to  figure  out 
the  idea  of  the 
repertoire  method 
from  seat-of-the- 
pants  examples, 
instead  of  giving 
us  a top-down 
presentation.  The 
method  works  best 
with  recurrences 
that  are  ‘linear’)  in 
the  sense  that  their 
solutions  can  be 
expressed  as  a sum 
of  arbitrary  param- 
eters multiplied  by 
functions  of  n,  as 
in  (1.13).  Equation 
(1.13)  is  the  key. 
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For  example,  when  n = 100  = (1100100)2,  our  original  Josephus  values 
a = l,  (3  = -1 , and  y = 1 yield 


n=  ( l l 0 0 l 0 0)2  = 100 

f(n)  = ( 1 1 -l  -l  1 -l  — 1 )2 

= +64  +32  -16  -8  +4  -2  -1  = n 

as  before.  The  cyclic-shift  property  follows  because  each  block  of  binary  digits 
(10  . , , 00)2  in  the  representation  of  n is  transformed  into 

(M  ...  -1  -1)2  = (00  ...  01)2. 

So  our  change  of  notation  has  given  us  the  compact  solution  (1.16)  to  the 
general  recurrence  (1.15).  If  we’re  really  uninhibited  we  can  now  generalize 
even  more.  The  recurrence 

f(j)  = as  , for  1 ^ j < d; 

f(dn  + j)  = cf(n)  + |3j  , for  0 ^ j < d a n d n ^ 1,  1 

is  the  same  as  the  previous  one  except  that  we  start  with  numbers  in  radix  d 
and  produce  values  in  radix  c.  That  is,  it  has  the  radix-changing  solution 

f((bmbm-i..  .bi  b0)d)=  (abm  3bm_,  Pbm.2 . . . |3b|  Pb0 )c • (i-is) 

For  example,  suppose  that  by  some  stroke  of  luck  we’re  given  the  recurrence 


f ( 1)  = 34, 

f < 2)  = 5, 

f(3u)  = 10f(n)  + 76,  for  n ^1, 
f(3n+l)  = I Of  ( n)  - 2,  for  n ^ 1 , 

f(3n  +2)  = 10f(n):  + 8 , far  n £ 1, 

and  suppose  we  want  to  compute  f ( 19) . Here  we  have  d = 3 and  c = 10.  Now 
19  = (201)3,  and  the  radix-changing  solution  tells  us  to  perform  a digit-by- 
digit replacement  from  radix  3 to  radix  10.  So  the  leading  2 becomes  a 5,  and 
the  0 and  1 become  76  and  -2,  giving 

f ( 19)  = f((201  )3)  = (5  76  — 2) to  = 1 2 5 8, 

which  is  our  answer. 

Thus  Josephus  and  the  Jewish-Roman  war  have  led  us  to  some  interesting 
general  recurrences. 


There  are  two 
kinds  of  general- 
i zat i ons.  One  is 
cheap  and  the  other 
i s val  nabl  e. 

It  is  easy  to  gen- 
eralize by  diluting 
a little  idea  with  a 
big  terminology. 

It  is  much  more 
difficult  to  pre- 
pare a refined  and 
condensed  extract 
from  several  good 
ingredients. 

" — G.  Pdlya  [238] 


Perhaps  this  was  a 
stroke  of  bad  luck. 


But  in  general  I'm 
against  recurrences 
of  Mr. 


Exercises 

Warmups 


1 EXERCISES  17 


Please  do  all  the 
warmups  in  all  the 
chapters! 

— The  Mgm ’t 


All  horses  are  the  same  color;  we  can  prove  this  by  induction  on  the 
number  of  horses  in  a given  set.  Here’s  how:  “If  there’s  just  one  horse 
then  it’s  the  same  color  as  itself,  so  the  basis  is  trivial.  For  the  induction 
step,  assume  that  there  are  n horses  numbered  1 to  n.  By  the  induc- 
tion hypothesis,  horses  1 through  n — 1 are  the  same  color,  and  similarly 
horses  2 through  n are  the  same  color.  But  the  middle  horses,  2 through 
n 1,  can’t  change  color  when  they’re  in  different  groups;  these  are 
horses,  not  chameleons.  So  horses  1 and  n must  be  the  same  color  as 
well,  by  transitivity.  Thus  all  n horses  are  the  same  color;  QED.”  What, 
if  anything,  is  wrong  with  this  reasoning? 


2  Find  the  shortest  sequence  of  moves  that  transfers  a tower  of  n disks 
from  the  left  peg  A to  the  right  peg  B,  if  direct  moves  between  A and  B 
are  disallowed.  (Each  move  must  be  to  or  from  the  middle  peg.  As  usual, 
a larger  disk  must  never  appear  above  a smaller  one.) 


3 Show  that,  in  the  process  of  transferring  a tower  under  the  restrictions  of 
the  preceding  exercise,  we  will  actually  encounter  every  properly  stacked 
arrangement  of  n disks  on  three  pegs. 

4 Are  there  any  starting  and  ending  configurations  of  n disks  on  three  pegs 
that  are  more  than  2n  - 1 moves  apart,  under  Lucas’s  original  rules? 


5  A “Venn  diagram’’  with  three  overlapping  circles  is  often  used  to  illustrate 
the  eight  possible  subsets  associated  with  three  given  sets: 


Can  the  sixteen  possibilities  that  arise  with  four  given  sets  be  illustrated 
by  four  overlapping  circles? 

6 Some  of  the  regions  defined  by  n lines  in  the  plane  are  infinite,  while 
others  are  bounded.  What’s  the  maximum  possible  number  of  bounded 
regions? 

7 Let  H(n)  = J(n+  1)  — J(n).  Equation  (1.8)  tells  us  that  H(2n)  = 2,  and 

H(2n+1)=  J(2n+2)-J(2n+1)  = (2J(n+1)-1)-(2J(n)+1)  = 2H(n)-2, 

for  all  n ^ 1.  Therefore  it  seems  possible  to  prove  that  H(n)  = 2 for  all  n, 
by  induction  on  n.  What’s  wrong  here? 
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Homework  exercises 

8 Solve  the  recurrence 

q o = a;  Qi  = |3 ; 

Q n = 0 + Qn-l)/Qn-2,  for  n > 1. 

Assume  that  Qn  ^ 0 for  all  n 0.  Hint:  Q4  = (1  + a)/(3. 

9 Sometimes  it’s  possible  to  use  induction  backwards,  proving  things  from 
n to  n 1 instead  of  vice  versa!  For  example,  consider  the  statement 

P(n)  : X!..  ,xn^  + n + Xn)  > xi  >•••>  *n  ^ 0. 

This  is  true  when  n = 2,  since  (xi  +X2)2  — 4xjX2  = (xi  — X2)2  ]>  0. 
a By  setting  Xn  = (xi  + ••■+  xn_i)/(n  1),  prove  that  P(n)  im- 
plies P(n  — 1)  whenever  n > 1, 
b Show  that  P(n)  and  P(2)  imply  P(2rt). 
c Explain  why  this  implies  the  truth  of  P(n)  for  all  rt. 

I 0 Let  Qn  be  the  minimum  number  of  moves  needed  to  transfer  a tower  of 

n disks  from  A to  B if  all  moves  must  be  dockwise-that  is,  from  A 
to  B,  or  from  B to  the  other  peg,  or  from  the  other  peg  to  A.  Also  let  Rn 
be  the  minimum  number  of  moves  needed  to  go  from  B back  to  A under 
this  restriction.  Prove  that 

0 = / °’  if  n = 0;  R _ f 0,  if  n = 0; 

Vn  \2Rn_,+1  if  rt  > 0:  n \ Qn  + Qn_i+1,  if  n > 0. 

(You  need  not  solve  these  recurrences;  we’ll  see  how  to  do  that  in  Chap- 
ter 7.) 

II  A Double  Tower  of  Hanoi  contains  2n  disks  of  n different  sizes,  two  of 
each  size.  As  usual,  we’re  required  to  move  only  one  disk  at  a time, 
without  putting  a larger  one  over  a smaller  one. 

a How  many  moves  does  it  take  to  transfer  a double  tower  from  one 
peg  to  another,  if  disks  of  equal  size  are  indistinguishable  from  each 
other? 

b What  if  we  are  required  to  reproduce  the  original  top-to-bottom 
order  of  all  the  equal-size  disks  in  the  final  arrangement?  [Hint: 
This  is  difficult-it’ s really  a “bonus  problem."] 

12  Let’s  generalize  exercise  11a  even  further,  by  assuming  that  there  are 
TTl  different  sizes  of  disks  and  exactly  disks  of  size  k.  Determine 
A(rti , . . . , n,),  the  minimum  number  of  moves  needed  to  transfer  a tower 
when  equal-size  disks  are  considered  to  be  indistinguishable. 


now  that's  a 
horse  of  a different 
color. 


1 EXERCISES  19 


Good  luck  keep- 
ing the  cheese  in 
position. 


Is  this  like  a 
five-star  general 
recurrence? 


13  What’s  the  maximum  number  of  regions  definable  by  n zig-zag  lines. 


ZZ2  = 12 

each  of  which  consists  of  two  parallel  infinite  half-lines  joined  by  a straight 
segment? 

14  How  many  pieces  of  cheese  can  you  obtain  from  a single  thick  piece  by 
making  five  straight  slices?  (The  cheese  must  stay  in  its  original  position 
while  you  do  all  the  cutting,  and  each  slice  must  correspond  to  a plane 
in  3D.)  Find  a recurrence  relation  for  P„  the  maximum  number  of  three- 
dimensional  regions  that  can  be  defined  by  n different  planes. 

15  Josephus  had  a friend  who  was  saved  by  getting  into  the  next-to-last 
position.  What  is  I(n),  the  number  of  the  penultimate  survivor  when 
every  second  person  is  executed? 

16  Use  the  repertoire  method  to  solve  the  general  four-parameter  recurrence 


g(i)  = 

g(2rt  + j)  = 3g(n)  +yn+ (3j  , for  j = 0,1  and  n ^ 1. 

Hint:  Try  the  function  g(n)  = n 

Exam  problems 

17  If  Wn  is  the  minimum  number  of  moves  needed  to  transfer  a tower  of  n 
disks  from  one  peg  to  another  when  there  are  four  pegs  instead  of  three, 
show  that 

Wn(n+i  )/2  6 2Wn(n__1  )/2  + Tn  , for  n > 0. 

(Here  Tn  = 2”  — 1 is  the  ordinary  three-peg  number.)  Use  this  to  find  a 
closed  form  f(n)  such  that  Wu(n+i)/2  SC  f(n)  for  all  n ^ 0. 

18  Show  that  the  following  set  of  n bent  lines  defines  Zn  regions,  where  Zn 
is  defined  in  (1.7):  The  jth  bent  line,  for  1 ^ j <)  n,  has  its  zig  at  (n2’,0) 
and  goes  up  through  the  points  (n2i  — n' , 1)  and  (n2’  — n?  — n rl , 1). 

19  Is  it  possible  to  obtain  Zn  regions  with  n bent  lines  when  the  angle  at 

each  zig  is  30°  ? 

20  Use  the  repertoire  method  to  solve  the  general  five-parameter  recurrence 

h(l)  = a; 

h.(2n -H)  = 4h.(n)  + yjn + |3j  , for  j = 0, 1 a n d n^l. 
Hint:  Try  the  functions  h(n)  = n and  h(n)  = n2. 


20  RECURRENT  PROBLEMS 


21  Suppose  there  are  2n  people  in  a circle;  the  first  n are  “good  guys” 

and  the  last  n are  “bad  guys!’  Show  that  there  is  always  an  integer  m 
(depending  on  n)  such  that,  if  we  go  around  the  circle  executing  every 
mth  person,  all  the  bad  guys  are  first  to  go.  (For  example,  when  n — 3 
we  can  take  m = 5;  when  n = 4 we  can  take  m = 30.) 

Bonus  problems 

22  Show  that  it’s  possible  to  construct  a Venn  diagram  for  all  2”  possible 

subsets  of  n given  sets,  using  n convex  polygons  that  are  congruent  to 
each  other  and  rotated  about  a common  center. 

23  Suppose  that  Josephus  finds  himself  in  a given  position  j,  but  he  has  a 

chance  to  name  the  elimination  parameter  q such  that  every  qth  person 
is  executed.  Can  he  always  save  himself? 

Research  problems 

24  Find  all  recurrence  relations  of  the  form 

x _ ap  + Qi  Xn_i  h 1-  aicXn  k 

bi  Xn_,  + . . + bkXn_k 

whose  solution  is  periodic. 

25  Solve  infinitely  many  cases  of  the  four-peg  Tower  of  Hanoi  problem  by 
proving  that  equality  holds  in  the  relation  of  exercise  17. 

26  Generalizing  exercise  23,  let’s  say  that  a Josephus  subset  of  {1,2,.  . . , n} 
is  a set  of  k numbers  such  that,  for  some  q,  the  people  with  the  other  n-k 
numbers  will  be  eliminated  first.  (These  are  the  k positions  of  the  “good 
guys”  Josephus  wants  to  save.)  It  turns  out  that  when  n = 9,  three  of  the 
29  possible  subsets  are  non-Josephus,  namely  {1,2, 5, 8, 9},  {2, 3, 4, 5,  8}, 
and  {2,5,6, 7, 8}.  There  are  13  non-Josephus  sets  when  n = 12,  none  for 
any  other  values  of  n <{  12.  Are  non-Josephus  subsets  rare  for  large  n? 


Yes,  and  well  done 
if  you  find  them. 


A terra  is  how  long 
this  course  lasts. 


Sums 


SUMS  ARE  EVERYWHERE  in  mathematics,  so  we  need  basic  tools  to  handle 
them.  This  chapter  develops  the  notation  and  general  techniques  that  make 
summation  user-friendly. 

2.1  NOTATION 

In  Chapter  1 we  encountered  the  sum  of  the  first  n integers,  which 

we  wrote  out  as  1 + 2 + 3 -I b (n  — 1 ) + n.  The  1 • • • ’ in  such  formulas  tells 

us  to  complete  the  pattern  established  by  the  surrounding  terms.  Of  course 
we  have  to  watch  out  for  sums  like  1 + 7 + . . . + 41.7,  which  are  meaningless 
without  a mitigating  context.  On  the  other  hand,  the  inclusion  of  terms  like 
3 and  (n  1)  was  a bit  of  overkill;  the  pattern  would  presumably  have  been 
clear  if  we  had  written  simply  1 + 2 + . . . + n.  Sometimes  we  might  even  be 
so  bold  as  to  write  just  1 + • • . + n. 

We’ll  be  working  with  sums  of  the  general  form 

Ql  + a2  +■■■+  a, , (2.1) 

where  each  is  a number  that  has  been  defined  somehow.  This  notation  has 
the  advantage  that  we  can  “see”  the  whole  sum,  almost  as  if  it  were  written 
out  in  full,  if  we  have  a good  enough  imagination. 

Each  element  of  a sum  is  called  a term.  The  terms  are  often  specified 
implicitly  as  formulas  that  follow  a readily  perceived  pattern,  and  in  such  cases 
we  must  sometimes  write  them  in  an  expanded  form  so  that  the  meaning  is 
clear.  For  example,  if 

1 + 2 + . , . + 2n_1 

is  supposed  to  denote  a sum  of  n terms,  not  of  2n  1 , we  should  write  it  more 
explicitly  as 

2°+  2 +•■  . +2n-\ 
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The  three-dots  notation  has  many  uses,  but  it  can  be  ambiguous  and  a 
bit  long-winded.  Other  alternatives  are  available,  notably  the  delimited  form 

Tl 

Y_  Qk-  (2-2) 

k=l 

which  is  called  Sigma-notation  because  it  uses  the  Greek  letter  ]T  (upper- 
case sigma).  This  notation  tells  us  to  include  in  the  sum  precisely  those 
terms  whose  index  k is  an  integer  that  lies  between  the  lower  and  upper 
limits  1 and  n,  inclusive.  In  words,  we  “sum  over  k,  from  1 to  tl.”  Joseph 
Fourier  introduced  this  delimited  t-notation  in  1820,  and  it  soon  took  the 
mathematical  world  by  storm. 

Incidentally,  the  quantity  after  ^ (here  a^)  is  called  the  summand. 

The  index  variable  k is  said  to  be  bound  to  the  sign  in  (2.2),  because 
the  k in  ok  is  unrelated  to  appearances  of  k outside  the  Sigma-notation.  Any 
other  letter  could  be  substituted  for  k here  without  changing  the  meaning  of 
(2.2).  The  letter  1 is  often  used  (perhaps  because  it  stands  for  “index”),  but 
we’ll  generally  sum  on  k since  it’s  wise  to  keep  i for  \/—  1 . 

It  turns  out  that  a generalized  Sigma-notation  is  even  more  useful  than 
the  delimited  form:  We  simply  write  one  or  more  conditions  under  the  ]T, 
to  specify  the  set  of  indices  over  which  summation  should  take  place.  For 
example,  the  sums  in  (2.1)  and  (2.2)  can  also  be  written  as 


“Le  signe  £j=~ 
indique  que 
I’on  doit  dormer 
au  nombre  entier  i 
toutes  ses  valeurs 
1 , 2 , 3 , . . . , et 
prendre  la  somme 
des  terraes.” 

J,  Fourier  [102] 


Well,  I wouldn’t 
want  to  use  a or  n 
as  the  index  vari- 
able instead  of  k in 
(2.2);  those  letters 

are  “tree  variables” 

that  do  have  mean- 
ing outside  the 
here. 


Y-  Qk  (2'3) 

1$k$n 


In  this  particular  example  there  isn’t  much  difference  between  the  new  form 
and  (2.2),  but  the  general  form  allows  us  to  take  sums  over  index  sets  that 
aren’t  restricted  to  consecutive  integers.  Fbr  example,  we  can  express  the  sum 
of  the  squares  of  all  odd  positive  integers  below  100  as  follows: 

I s’. 

1$k<100 
k odd 


The  delimited  equivalent  of  this  sum, 

49 

Jj2k+  I)2, 

k=0 

is  more  cumbersome  and  less  clear.  Similarly,  the  sum  of  reciprocals  of  all 
prime  numbers  between  1 and  N is 


p prime 


2.1  NOTATION  23 


The  summation 
symbol  looks  like 
a distorted  pacman. 


A tidy  sum. 


That's  nothing. 

You  should  see  how 
many  times  E ap- 
pears in  The  Iliad. 


the  delimited  form  would  require  us  to  write 
7t(N)  . 


where  denotes  the  kth  prime  and  n(N)  is  the  number  of  primes  <C  N. 
(Incidentally,  this  sum  gives  the  approximate  average  number  of  distinct  prime 
factors  of  a random  integer  near  N,  since  about  1 /p  of  those  integers  are 
divisible  by  p.  Its  value  for  large  N is  approximately  lnln  N+  0.261972128; 
In  x stands  for  the  natural  logarithm  of  x,  and  In  In  x stands  for  ln(  In  x)  .) 

The  biggest  advantage  of  general  Sigma-notation  is  that  we  can  manip- 
ulate it  more  easily  than  the  delimited  form.  For  example,  suppose  we  want 
to  change  the  index  variable  k to  k T 1,  With  the  general  form,  we  have 

Y Qk  = Y Qk+i  > 

l^k^n  1^k+1^n 

it’s  easy  to  see  what’s  going  on,  and  we  can  do  the  substitution  almost  without 
thinking.  But  with  the  delimited  form,  we  have 

n n— 1 

Y Qk  = Y Qk+i  ■ 

k=l  k=0 

it’s  harder  to  see  what’s  happened,  and  we’re  more  likely  to  make  a mistake. 

On  the  other  hand,  the  delimited  form  isn’t  completely  useless.  It’s 
nice  and  tidy,  and  we  can  write  it  quickly  because  (2.2)  has  seven  symbols 
compared  with  (2.3)‘s  eight.  Therefore  we’ll  often  use  Y_  with  upper  and 
lower  delimiters  when  we  state  a problem  or  present  a result,  but  we’ll  prefer 
to  work  with  relations-under-x  when  we’re  manipulating  a sum  whose  index 
variables  need  to  be  transformed. 

The  sign  occurs  more  than  1000  times  in  this  book,  so  we  should  be 
sure  that  we  know  exactly  what  it  means.  Formally,  we  write 

Y Uk 

P(k] 

as  an  abbreviation  for  the  sum  of  all  terms  such  that  k is  an  integer 
satisfying  a given  property  P(k).  (A  “property  P(k)”  is  any  statement  about 
k that  can  be  either  true  or  false.)  For  the  time  being,  we’ll  assume  that 
only  finitely  many  integers  k satisfying  P ( k ) have  ^ 0;  otherwise  infinitely 
many  nonzero  numbers  are  being  added  together,  and  things  can  get  a bit 
tricky.  At  the  other  extreme,  if  P(k)  is  false  for  all  integers  k,  we  have  an 
“empty”  sum;  the  value  of  an  empty  sum  is  defined  to  be  zero. 


(2-4) 
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A slightly  modified  form  of  (2.4)  is  used  when  a sum  appears  within  the 
text  of  a paragraph  rather  than  in  a displayed  equation:  We  write  Q|<', 

attaching  property  P(k)  as  a subscript  of  Y_,  so  that  the  formula  won’t  stick 
out  too  much.  Similarly,  ‘]T£=1  ak’  is  a convenient  alternative  to  (2.2)  when 
we  want  to  confine  the  notation  to  a single  line. 

People  are  often  tempted  to  write 


n-1  n 

y k(k-  l)(n—  k)  instead  of  V"  k(k-  l)(n-  k) 
k=2  k=0 


because  the  terms  for  k = 0,  1,  and  n in  this  sum  are  zero.  Somehow  it 
seems  more  efficient  to  add  up  n — 2 terms  instead  of  n + 1 terms.  But  such 
temptations  should  be  resisted;  efficiency  of  computation  is  not  the  same  as 
efficiency  of  understanding!  We  will  find  it  advantageous  to  keep  upper  and 
lower  bounds  on  an  index  of  summation  as  simple  as  possible,  because  sums 
can  be  manipulated  much  more  easily  when  the  bounds  are  simple.  Indeed, 
the  form  XL£=2  can  even  be  dangerously  ambiguous,  because  its  meaning  is 
not  at  ah  clear  when  n = 0 or  n = 1 (see  exercise  1).  Zero- valued  terms  cause 
no  harm,  and  they  often  save  a lot  of  trouble. 

So  far  the  notations  we’ve  been  discussing  are  quite  standard,  but  now 
we  are  about  to  make  a radical  departure  from  tradition.  Kenneth  Iverson 
introduced  a wonderful  idea  in  his  programming  language  APL  [161,  page  11], 
and  we’ll  see  that  it  greatly  simplifies  many  of  the  things  we  want  to  do  in 
this  book.  The  idea  is  simply  to  enclose  a true-or-false  statement  in  brackets, 
and  to  say  that  the  result  is  1 if  the  statement  is  true.  0 if  the  statement  is 
false.  For  example, 

, . _ f 1 , if  p is  a prime  number; 

[p  prime  | 0,  if  p is  not  a prime  number. 

Iverson’s  convention  allows  us  to  express  sums  with  no  constraints  whatever 
on  the  index  of  summation,  because  we  can  rewrite  (2.4)  in  the  form 

£ak[P(k)].  (2-5) 

k 


Hey:  The  “KfO- 
necker  delta”  that 
I’ve  seen  in  other 
books  (I  mean 
5kn  , which  is  1 if 
k = n,  0 oth- 
erwise) is  just  a 
special  case  of 
Iverson  ’s  conven- 
tion: We  can  write 
[ k = n ] instead. 


If  P(k)  is  false,  the  term  ak[P(k)]  is  zero,  so  we  can  safely  include  it  among 
the  terms  being  summed.  This  makes  it  easy  to  manipulate  the  index  of 
summation,  because  we  don’t  have  to  fuss  with  boundary  conditions. 

A slight  technicality  needs  to  be  mentioned:  Sometimes  ak  isn’t  defined 
for  ah  integers  k.  We  get  around  this  difficulty  by  assuming  that  [P(k)]  is 
“very  strongly  zero”  when  P(k)  is  false;  it’s  so  much  zero,  it  makes  ok  [P(k)] 
equal  to  zero  even  when  ak  is  undefined.  For  example,  if  we  use  Iverson’s 
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convention  to  write  the  sum  of  reciprocal  primes  ^ N as 

Y [p  prime]  [p^Nl/p, 

p 


. . and  it’s  less 
likely  to  lose  points 
on  an  exam  for 
“lack  of  rigor.” 


there’s  no  problem  of  division  by  zero  when  p = 0,  because  our  convention 
tells  us  that  [0  prime]  [0  ^ N]/0  = 0. 

Let’s  sum  up  what  we’ve  discussed  so  far  about  sums.  There  are  two 
good  ways  to  express  a sum  of  terms:  One  way  uses  1 ■ . • the  other  uses 
1 The  three-dots  form  often  suggests  useful  manipulations,  particularly 

the  combination  of  adjacent  terms,  since  we  might  be  able  to  spot  a simplifying 
pattern  if  we  let  the  whole  sum  hang  out  before  our  eyes.  But  too  much  detail 
can  also  be  overwhelming.  Sigma-notation  is  compact,  impressive  to  family 
and  friends,  and  often  suggestive  of  manipulations  that  are  not  obvious  in 
three-dots  form.  When  we  work  with  Sigma-notation,  zero  terms  are  not 
generally  harmful;  in  fact,  zeros  often  make  t-manipulation  easier. 
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OK,  we  understand  now  how  to  express  sums  with  fancy  notation. 
But  how  does  a person  actually  go  about  finding  the  value  of  a sum?  One  way 
is  to  observe  that  there’s  an  intimate  relation  between  sums  and  recurrences. 
The  sum 


n 

Su  ~ ^ dk 

k=0 


(Think  of  Sn  as 
not  just  a single 
number,  but  as  a 
sequence  defined  for 
all  n ^ 0 .) 


is  equivalent  to  the  recurrence 

So  = a0; 

Sn  = Sn_i  + a,,  for  n > 0. 


(2.6) 


Therefore  we  can  evaluate  sums  in  closed  form  by  using  the  methods  we 
learned  in  Chapter  1 to  solve  recurrences  in  closed  form. 

For  example,  if  a„  is  equal  to  a constant  plus  a multiple  of  n,  the  sum- 
recurrence  (2.6)  takes  the  following  general  form: 


R0  = a; 

Rn  = Rn— 1 + (3  + yn , for  n > 0. 


(2-7) 


Proceeding  as  in  Chapter  1,  we  find  R;  = (X  + (3  + y,  R2  = (X  + 2(3  + 3y,  and 
so  on;  in  general  the  solution  can  be  written  in  the  form 


Rn  = A(n)  a + B(n)  (3  + C(n)y  , 


(2.8) 
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where  A(n),  B(n),  and  C(n)  are  the  coefficients  of  dependence  on  the  general 
parameters  a,  |3,  and  y. 

The  repertoire  method  tells  us  to  try  plugging  in  simple  functions  of  n 
for  Ra,  hoping  to  find  constant  parameters  (X,  (3,  and  y where  the  solution  is 
especially  simple.  Setting  Rn  = 1 implies  (X  = 1,  (3  = 0,  y = 0;  hence 

A(n)  = 1. 

Setting  Rn  = n implies  a = 0,  (3  = 1 , y = 0;  hence 
B (n)  = n. 

Setting  Rn  = n2  implies  a = 0,  (3  = -1,  y = 2;  hence 
2C(n)  - B (n)  = n2 

Actually  easier;  71  = 

y « 

4-  nJO  (4n  + l !(4n+3 ] . 


the  sum-recurrence  (2.6)  boils  down  to  (2.7)  with  a = (3  = a,  y = b,  and  the 
answer  is  aA(n)  -f  aB(n)  T bC(n)  = a(n  + 1)  -f-  b(n  + l)n/2. 

Conversely,  many  recurrences  can  be  reduced  to  sums;  therefore  the  spe- 
cial methods  for  evaluating  sums  that  we’ll  be  learning  later  in  this  chapter 
will  help  us  solve  recurrences  that  might  otherwise  be  difficult.  The  Tower  of 
Hanoi  recurrence  is  a case  in  point: 

To  = 0; 

Tn  = 2Tn_  1 + 1 , for  n > 0. 

It  can  be  put  into  the  special  form  (2.6)  if  we  divide  both  sides  by  2n: 

To/2°  = 0; 

Tn/2n  = I,.  , - 2'1  1 +l/2n,  for  n > 0. 

Now  we  can  set  Sn  = Jn/2n,  and  we  have 
so  = 0; 

Sn  = Sn._i  +2  n , 

It  follows  that 

Sn  = 

k=l 


and  we  have  C(n)  = (n2  +n)/2.  Easy  as  pie. 
Therefore  if  we  wish  to  evaluate 
n 

^ja  + bk)  , 

k=0 


for  n > 0. 
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(Notice  that  we’ve  left  the  term  for  k = 0 out  of  this  sum.)  The  sum  of  the 

geometric  series  2_1  +2~2H t-2~n  = (j)1  + (^)2H h ( j )n  will  be  derived 

later  in  this  chapter;  it  turns  out  to  be  1 )n.  Hence  Tn  = 2nSn  = 2”  1. 

We  have  converted  Tn  to  Sn  in  this  derivation  by  noticing  that  the  re- 
currence could  be  divided  by  2n.  This  trick  is  a special  case  of  a general 
technique  that  can  reduce  virtually  any  recurrence  of  the  form 

QnTn  = bnTn_i  + Cn  (2-9) 

to  a sum.  The  idea  is  to  multiply  both  sides  by  a summation  factor,  sn: 

SnO-nTn  = Sub^T^-l  4-  . 

This  factor  sn  is  cleverly  chosen  to  make 
snb n = Sn_i  Gn-l 

Then  if  we  write  Sn  = SnanTn  we  have  a sum-recurrence, 

Sri  — Sn— i SnCn  ■ 


Hence 


Sn  = S0G0To  + SkCk  = SibiTo  + Y_  skck  . 

k=1 

and  the  solution  to  the  original  recurrence  (2.9)  is 

n 


Tn  = — ^ — ( sit),T0  + Y skck  ) . 
snan  V ' 


(2.10) 


[ The  value  of  S] 
cancels  out,  so  it 
can  be  anything 
but  zero.) 


For  example,  when  n = 1 we  get  Ti  = (sibiTo  + sici  )/siQi  = (b]T0  + ci  )/qi  . 

But  how  can  we  be  clever  enough  to  find  the  right  s,?  No  problem:  The 
relation  sn  = Sn_i  an_i  /bn  can  be  unfolded  to  tell  us  that  the  fraction 


Gri-  1 Q-n-2  • • • CM 

= bnbn_i...b2 


(2.11) 


or  any  convenient  constant  multiple  of  this  value,  will  be  a suitable  summation 
factor.  For  example,  the  Tower  of  Hanoi  recurrence  has  a,,  = 1 and  bn  = 2; 
the  general  method  we’ve  just  derived  says  that  sn  = 2~n  is  a good  thing  to 
multiply  by,  if  we  want  to  reduce  the  recurrence  to  a sum.  We  don’t  need  a 
brilliant  flash  of  inspiration  to  discover  this  multiplier. 

We  must  be  careful,  as  always,  not  to  divide  by  zero.  The  summation- 
factor  method  works  whenever  all  the  a’s  and  all  the  b’s  are  nonzero. 
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Let’s  apply  these  ideas  to  a recurrence  that  arises  in  the  study  of  “quick- 
sort,” one  of  the  most  important  methods  for  sorting  data  inside  a computer. 
The  average  number  of  comparison  steps  made  by  quicksort  when  it  is  applied 
to  n items  in  random  order  satisfies  the  recurrence 

Co  = 0 ; 

2 v—1  (2-12) 

Cn  = n + 1 -t — V Cic,  for  n > 0. 
n k=0 

Hmmm.  This  looks  much  scarier  than  the  recurrences  we’ve  seen  before;  it 
includes  a sum  over  all  previous  values,  and  a division  by  n.  Trying  small 
cases  gives  us  some  data  (Ci  = 2,  Cx  = 5,  C3  = y)  but  doesn’t  do  anything 
to  quell  our  fears. 

We  can,  however,  reduce  the  complexity  of  (2.12)  systematically,  by  first 
getting  rid  of  the  division  and  then  getting  rid  of  the  Y sign.  The  idea  is  to 
multiply  both  sides  by  n,  obtaining  the  relation 

n-t 

uCn  = n2  + n + 2 ^ Ck  , for  n > 0; 

k=0 

hence,  if  we  replace  n.  by  n — 1 , 

n-2 

(n-  1)Cn_i  = (n - 1 )2  + (n - 1 ) + 2 Ck  , fom-1>0. 

k=0 

We  can  now  subtract  the  second  equation  from  the  first,  and  the  Y sign 
disappears: 


nCn  - (n  - 1 )Cn_i  = 2n  + 2Cn_i  , for  n > 1 , 

It  turns  out  that  this  relation  also  holds  when  n = 1,  because  C;  = 2.  There- 
fore the  original  recurrence  for  Cn  reduces  to  a much  simpler  one: 

co  = 0; 

nCn  = (nf  1 )Cn-i  + 2n , forn>0. 

Progress.  We’re  now  in  a position  to  apply  a summation  factor,  since  this 
recurrence  has  the  form  of  (2.9)  with  a,  = n,  bn  = n + 1,  and  = 2n. 
The  general  method  described  on  the  preceding  page  tells  us  to  multiply  the 
recurrence  through  by  some  multiple  of 

Qft-iQn-2 • • ■ Qi  (n-  1)  • (n-2)  • • 1 _ 2 

Sn  = bnh1t^i  . . = (n  + 1 ) • n • . . . • 3 - (n+1)n 


(Quicksort  was 

invented  by  Ho are 
in  1962  [158].) 
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We  started  with  a 
Y in  the  recur- 
rence, and  worked 
hard  to  get  rid  of  it. 
But  then  after  ap- 
plying a summation 
factor,  we  came  up 
with  another  Y ■ 
Are  sums  good,  or 
bad,  or  what? 


The  solution,  according  to  (2.10),  is  therefore 


C, 


2(n+  1)  Y_ 

k=1 


1 

k+T 


The  sum  that  remains  is  very  similar  to  a quantity  that  arises  frequently 
in  applications.  It  arises  so  often,  in  fact,  that  we  give  it  a special  name  and 
a special  notation: 


Hn 


l + 2+" 


+ 


n 


(2-13) 


The  letter  H stands  for  “harmonic”;  Hn  is  a harmonic  number,  so  called 
because  the  kth  harmonic  produced  by  a violin  string  is  the  fundamental 
tone  produced  by  a string  that  is  1/k  times  as  long. 

We  can  complete  our  study  of  the  quicksort  recurrence  (2.12)  by  putting 
Cn  into  closed  form;  this  will  be  possible  if  we  can  express  Cn  in  terms  of  H,. 
The  sum  in  our  formula  for  Cn  is 


n 


L 


1 

k+T 


L 

l$k$n 


1 

k+T' 


We  can  relate  this  to  Hn  without  much  difficulty  by  changing  k to  k — 1 and 
revising  the  boundary  conditions: 


L 


1 

k+T 


= H„ 


n 

n + r 


But  your  spelling  is  Alright!  We  have  found  the  sum  needed  to  complete  the  solution  to  (2.12): 
ji/wrong.  The  average  number  of  comparisons  made  by  quicksort  when  it  is  applied  to 

n randomly  ordered  items  of  data  is 


Cn  = 2(n  + 1 )Hn  — 2n . 


(2.14) 


As  usual,  we  check  that  small  cases  are  correct:  Co  = 0,  Ci  = 2,  C2  = 5. 
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2.3  MANIPULATION  OF  SUMS 

The  key  to  success  with  sums  is  an  ability  to  change  one  into 
another  that  is  simpler  or  closer  to  some  goal.  And  it’s  easy  to  do  this  by 
learning  a few  basic  rules  of  transformation  and  by  practicing  their  use. 

Let  K be  any  finite  set  of  integers.  Sums  over  the  elements  of  K can  be 
transformed  by  using  three  simple  rules: 


cak 

k€K 

= C 21  ak  i 
k£K 

(distributive  law) 

(2.15) 

21(ak  + bk) 

k£K 

= 21 ak  + ^1 bk  ’ 

k€K  k£K 

(associative  law) 

(2.16) 

= Y_  QP(k)  ' 

(commutative  law) 

(2.17) 

keK  p(k)£K 

The  distributive  law  allows  us  to  move  constants  in  and  out  of  a The 
associative  law  allows  us  to  break  a ^ into  two  parts,  or  to  combine  two  ^’s 
into  one.  The  commutative  law  says  that  we  can  reorder  the  terms  in  any  way 
we  please;  here  p(k)  is  any  permutation  of  the  set  of  all  integers.  For  example, 
if  K = (-1  ,0,  -fl}  and  if  p(k)  = — k,  these  three  laws  tell  us  respectively  that 

cq_i  + cao  + CQ]  = cfa-i  + ao  + ai ) ; (distributive  law) 

(ct-i  +b-i)  + (ao  + bo)  + (ai  +bi) 

= (a_i  + ao  + ai ) + (b-i  + bo  + bi ) ; (associative  law) 

a_i  + a0  + ai  = ai  + aO  + a_i  . (commutative  law) 

Gauss’s  trick  in  Chapter  1 can  be  viewed  as  an  application  of  these  three 
basic  laws.  Suppose  we  want  to  compute  the  general  sum  of  an  arithmetic 
progression, 

S = 21  (Q  + bb)  • 


By  the  commutative  law  we  can  replace  k by  n — k,  obtaining 

S = 2^  (a  + b(n-k))  = 21  (a  + bn-bk). 

O^n-k^Ti  O^k^n 

These  two  equations  can  be  added  by  using  the  associative  law: 

2S  = 21  ((a  + bk)  + (a  + bn-bk))  = ^ (2a  + bn) . 

O^k^n  O^k^n 


Not  to  be  confused 
with  finance, 


Why  not  call  it 
permutative  instead 
of  commutative? 


This  is  something 
like  changing  vari- 
ables inside  an 
integral,  but  easier, 
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“What’s  one 
and  one  and  one 
and  one  and  one 
and  one  and  one 
and  one  and  one 
and  one?” 

“/  don’t  know,” 
said  Alice. 

“/  lost  count.” 

“She  can’t  do 
Addition,” 

-Lewis  Carroll  [44] 


Additional,  eh? 


And  we  can  now  apply  the  distributive  law  and  evaluate  a trivial  sum: 

2S  = (2a  + bn)^l  = (2a  + bn)(n  + 1 ) . 

0$k$n 

Dividing  by  2,  we  have  proved  that 

n 

^(a  + bk)  = (a+ 5bri)(n+ 1) . (2.18) 

k=0 

The  right-hand  side  can  be  remembered  as  the  average  of  the  first  and  last 
terms,  namely  | (a  + (a  + bn)),  times  the  number  of  terms,  namely  (n  + 1). 

It’s  important  to  bear  in  mind  that  the  function  p(k)  in  the  general 

commutative  law  (2.17)  is  supposed  to  be  a permutation  of  all  the  integers.  In 

other  words,  for  every  integer  n there  should  be  exactly  one  integer  k such  that 
p(k)  = n.  Otherwise  the  commutative  law  might  fail;  exercise  3 illustrates 
this  with  a vengeance.  Transformations  like  p(k)  = k + c or  p(k)  = c k, 
where  c is  an  integer  constant,  are  always  permutations,  so  they  always  work. 

On  the  other  hand,  we  can  relax  the  permutation  restriction  a little  bit: 
We  need  to  require  only  that  there  be  exactly  one  integer  k with  p(k)  = n 
when  n is  an  element  of  the  index  set  K.  If  n ^ K (that  is,  if  n is  not  in  K), 
it  doesn’t  matter  how  often  p(k)  = n occurs,  because  such  k don’t  take  part 

in  the  sum.  Thus,  for  example,  we  can  argue  that 

Y ak  = Y.  Qn  “ H aik  ~ H a2k>  (2-19) 

keK  nGK  2k£K  2keK 

k even  Tl  even  2k  even 

since  there’s  exactly  one  k such  that  2k  = n when  n £ K and  n is  even. 

Iverson’s  convention,  which  allows  us  to  obtain  the  values  0 or  1 from 
logical  statements  in  the  middle  of  a formula,  can  be  used  together  with  the 
distributive,  associative,  and  commutative  laws  to  deduce  additional  proper- 
ties of  sums.  For  example,  here  is  an  important  rule  for  combining  different 

sets  of  indices:  If  K and  K’  are  any  sets  of  integers,  then 

Y Qk  + Y ak  = Y Qk  + Y Qk.  (2.20) 

k£K  kgK'  keKnK'  kGKUK' 

This  follows  from  the  general  formulas 

]Tak  = J^ak[keK]  (2.21) 

k£K  k 

and 


[keK]  + [keK']  = [keKnK']  + [keKu  K'] . 


(2.22) 
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Typically  we  use  rule  (2.20)  either  to  combine  two  almost-disjoint  index  sets, 
as  in 

m n n 

ak  + ak  — Qm  + L ai< , for  1 ^ m ^ n; 
k=l  k=m  k=l 

or  to  split  off  a single  term  from  a sum,  as  in 

L ak  = a0  + ^ ak  , for  n ^ 0.  (2.23) 

0^k$n  l$k<:n 

This  operation  of  splitting  off  a term  is  the  basis  of  a perturbation 
method  that  often  allows  us  to  evaluate  a sum  in  closed  form.  The  idea 
is  to  start  with  an  unknown  sum  and  call  it  S,: 

Sn  — ^k  • 

O^k^n 

(Name  and  conquer.)  Then  we  rewrite  Sn+i  in  two  ways,  by  splitting  off  both 
its  last  term  and  its  first  term: 


Sn  + Qn  + 1 = 2__  ak  — aO  + ak 

0$k^n+l  lSJk§n+1 

= a0+  ak  + 1 

1«k+l$n+1 

= do  + Yi  Qk+1  • (2.24) 

Os^k^n 

Now  we  can  work  on  this  last  sum  and  try  to  express  it  in  terms  of  S,.  If  we 
succeed,  we  obtain  an  equation  whose  solution  is  the  sum  we  seek. 

For  example,  let’s  use  this  approach  to  find  the  sum  of  a general  geomet- 
ric progression, 

Sn  = a*k. 

O^k^n 

The  general  perturbation  scheme  in  (2.24)  tells  us  that 


Sn  + QXn+1  = CtX°  + £ QXk+1  , 

O^k^n 

and  the  sum  on  the  right  is  x ^0<k<n  axk  = xSn  by  the  distributive  law. 
Therefore  Sn  -f  axn+1  = a + xSn,  and  we  can  solve  for  Sn  to  obtain 


L“l 

k = 0 


a — ax 


n+l 


(The  two  sides  of 
(2.20)  have  been 
switched  here.) 


If  it’s  geometric, 
there  should  be  a 
geometric  proof. 


1 -x 


, f o r x # 1 


(2.25) 
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Ah  yes,  this  formula 
was  drilled  into  me 
in  high  school. 


(When  x — 1,  the  sum  is  of  course  simply  (n  + 1 )a.)  The  right-hand  side 
can  be  remembered  as  the  first  term  included  in  the  sum  minus  the  first  term 
excluded  (the  term  after  the  last),  divided  by  1 minus  the  term  ratio. 

That  was  almost  too  easy.  Let’s  try  the  perturbation  technique  on  a 
slightly  more  difficult  sum, 

Sn  = Y.  k2l< 

O^k^n 

In  this  case  we  have  So  = 0,  Si  = 2,  S2  = 10,  S3  = 34,  84=  98;  what  is  the 
general  formula?  According  to  (2.24)  we  have 

Sn  + (n  + 1 )2n+1  = (k  + 1 )2k+1  ; 

O^k^n 


so  we  want  to  express  the  right-hand  sum  in  terms  of  S,.  Well,  we  can  break 
it  into  two  sums  with  the  help  of  the  associative  law, 

H k2k+1  + T-  2k+1  ’ 

O^k^n  O^k^n 

and  the  first  of  the  remaining  sums  is  2Sn.  The  other  sum  is  a geometric 
progression,  which  equals  (2  — 2n+2)/(  1 — 2)  = 2n+2  2 by  (2.25).  Therefore 

we  have  Sn  + (n  + 1 )2n+1  = 2Sn  + 2n+ 2 - 2,  and  algebra  yields 

Y_  k2k  = (n-  1 )2n+1  +2. 

O^k^n 

Now  we  understand  why  S3  = 34:  It’s  32  + 2,  not  2.17. 

A similar  derivation  with  x in  place  of  2 would  have  given  us  the  equation 
Sn  + (n  + 1)xn+1  = xSn  + (x  - xn+2)/(1  x);  hence  we  can  deduce  that 


^kxk 

k=0 


x - (n  + 1 )xn+1  + nxn+2 


for  x ytl 


12,261 


It’s  interesting  to  note  that  we  could  have  derived  this  closed  form  in  a 
completely  different  way,  by  using  elementary  techniques  of  differential  cal- 
culus. If  we  start  with  the  equation 


£xk  = 

k=e 


,n+l 


1 - x 


and  take  the  derivative  of  both  sides  with  respect  to  x,  we  get 

o _ (1— x)(— (n+1)xn)  + 1-xn+1  _ 1 -ln+  1)xn  + rtxn+1 


£>xk 


1 1 — X 


-X 


II 
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because  the  derivative  of  a sum  is  the  sum  of  the  derivatives  of  its  terms.  We 
will  see  many  more  connections  between  calculus  and  discrete  mathematics 
in  later  chapters. 

2.4  MULTIPLE  SUMS 

The  terms  of  a sum  might  be  specified  by  two  or  more  indices,  not 
just  by  one.  For  example,  here’s  a double  sum  of  nine  terms,  governed  by  two 
indices  j and  k: 

ajbk  - aibi  + aib2  + aib3 

+ a2b!  + azbi  + c^bj 

+ a3b]  + a3b2  + a3b3 . 

We  use  the  same  notations  and  methods  for  such  sums  as  we  do  for  sums  with 
a single  index.  Thus,  if  P(j,  k)  is  a property  of  j and  k,  the  sum  of  all  terms 
a,'  k such  that  P(j,  k)  is  true  can  be  written  in  two  ways,  one  of  which  uses 
Iverson’s  convention  and  sums  over  all  pairs  of  integers  j and  k: 

Y_  Oj.k  = Qi.k  [P(i.k)]  • 

P(j,kJ  j,k 

Only  one  sign  is  needed,  although  there  is  more  than  one  index  of  sum- 
mation; Y.  denotes  a sum  over  all  combinations  of  indices  that  apply. 

We  also  have  occasion  to  use  two  ^’s,  when  we’re  talking  about  a sum 
of  sums.  For  example, 

7 7 Qj.k  [P(U)] 

i k 

is  an  abbreviation  for 

[P(i,M]V 

j k k ' 

which  is  the  sum,  over  all  integers  j,  of  ajik  [P(j,  k)],  the  latter  being  the 

sum  over  all  integers  k of  all  terms  Qj  k for  which  P(j,  k)  is  true.  In  such  cases 

we  say  that  the  double  sum  is  “summed  first  on  k!’  A sum  that  depends  on 
more  than  one  index  can  be  summed  first  on  any  one  of  its  indices. 

In  this  regard  we  have  a basic  law  called  interchanging  the  order  of 
summation,  which  generalizes  the  associative  law  (2.16)  we  saw  earlier: 

y y Qj,k[p(j,fc)i  = y_  a5>k  = y_  y a^po,^].  (2.27) 

j k P(i,k)  k i 


Oh  no,  a nine-term 
governor. 

Notice  that  this 
doesn’t  mean  to 
sum  over  aiij  Ssl 
and  all  k $ 3. 


Multiple  E’s  are 
evaluated  right  to 
left  (inside-out). 
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Who’s  panicking? 

/ think  this  rule 
is  fairly  obvious 
compared  to  some 
of  the  stuff  in 
Chapter  l. 


The  middle  term  of  this  law  is  a sum  over  two  indices.  On  the  left,  ]Tj  ]Tk 
stands  for  summing  first  on  k,  then  on  j.  On  the  right,  stands  for 

summing  first  on  j,  then  on  k.  In  practice  when  we  want  to  evaluate  a double 
sum  in  closed  form,  it’s  usually  easier  to  sum  it  first  on  one  index  rather  than 
on  the  other;  we  get  to  choose  whichever  is  more  convenient. 

Sums  of  sums  are  no  reason  to  panic,  but  they  can  appear  confusing  to 
a beginner,  so  let’s  do  some  more  examples.  The  nine-term  sum  we  began 
with  provides  a good  illustration  of  the  manipulation  of  double  sums,  because 
that  sum  can  actually  be  simplified,  and  the  simplification  process  is  typical 
of  what  we  can  do  with  Y Y’s: 


Y Cljbk  = y_  ajb)c[l  = ^ajbk[l  ^j^3][1  ^k^3] 

l$j,k$3  j,k  j ,1c 

= LL  ajbk[Kj$3][Uk^3] 

i k 

= Xai[1  ^K3]^bkH 

k 

= ^a,[1  ^K3](^bk[Kk^3]N) 

j k k / 

= ^ai[l<:js:3]J^bk[lslcS3]J 

= (£-)(z>)- 

vi  = 1 7 vk=l  7 


The  first  line  here  denotes  a sum  of  nine  terms  in  no  particular  order.  The 
second  line  groups  them  in  threes,  (qi  b;  + Q;  b 2 T ct;  b3)  + (a2bi  + a2b2  + 
a2b3)  + (a3bi  -(-  a3b2  + a3b3).  The  third  line  uses  the  distributive  law  to 
factor  out  the  a’s,  since  a;  and  [1  SC  3]  do  not  depend  on  k;  this  gives 
Qi(bi  + b2  + b3)  + a2(bi  + b2  + b3)  + a3(b]  + b2  + b3).  The  fourth  line  is 
the  same  as  the  third,  but  with  a redundant  pair  of  parentheses  thrown  in 
so  that  the  fifth  line  won’t  look  so  mysterious.  The  fifth  line  factors  out  the 
(bi  + b2  + b3)  that  occurs  for  each  value  of  j:  (ai  + a2  + a3)(bi  + b2  + b3). 
The  last  line  is  just  another  way  to  write  the  previous  line.  This  method  of 
derivation  can  be  used  to  prove  a general  distributive  law, 

Y_  Qjbk  = (n  Qi)  (^Lbk)  ’ (2‘28) 

lei  k jer  7 k k€K  7 

keK 

valid  for  all  sets  of  indices  J and  K. 

The  basic  law  (2.27)  for  interchanging  the  order  of  summation  has  many 
variations,  which  arise  when  we  want  to  restrict  the  ranges  of  the  indices 
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instead  of  summing  over  all  integers  j and  k.  These  variations  come  in  two 
flavors,  vanilla  and  rocky  road.  First,  the  vanilla  version: 

Y_  H q)T  = a).k  = Y.  Y.  afk  (2-29) 

j€I  kSK  jet  k€K  jet 

k€K 

This  is  just  another  way  to  write  (2.27),  since  the  Iversonian  [j  6 J,  k(E  K] 
factors  into  [j  £ J]  [k  € K],  Hie  vanilla-flavored  law  applies  whenever  the  ranges 
of  j and  k are  independent  of  each  other. 

The  rocky-road  formula  for  interchange  is  a little  trickier.  It  applies  when 
the  range  of  an  inner  sum  depends  on  the  index  variable  of  the  outer  sum: 

I I «.*  = L L (2-3o) 

j€J  keK(j)  kGK'  iej'(k) 

Here  the  sets  J,  K(j),  K’,  and  J’ (k)  must  be  related  in  such  a way  that 
[je  J][keK(j)]  = [k 6 K'] [j 6 J'(k)] . 

A factorization  like  this  is  always  possible  in  principle,  because  we  can  let 
J = K’  be  the  set  of  all  integers  and  K(j)  = J’(k)  be  the  basic  property  P(j,  k) 
that  governs  a double  sum.  But  there  are  important  special  cases  where  the 
sets  J,  K(j),  K’,  and  J’ (k)  have  a simple  form.  These  arise  frequently  in 
applications.  For  example,  here’s  a particularly  useful  factorization: 


[1  ^KnHKk^n]  = [1  ;$j<ck;Cn]  = [1  ^ k$n][l  ^ j ^ k] . (2.3 1) 


This  Iversonian  equation  allows  us  to  write 
n n n k 

y~  y~  Qj,k  — ctj.k = y~  q^ic  . (2-32) 

)=1  k=j  1$)$k$n  k=l  j=1 


One  of  these  two  sums  of  sums  is  usually  easier  to  evaluate  than  the  other; 
we  can  use  (2.32)  to  switch  from  the  hard  one  to  the  easy  one. 

Let’s  apply  these  ideas  to  a useful  example.  Consider  the  array 


CL]  01 

aia2 

at  03 

Ql  an ' 

Q2Q1 

a2a2 

0203 

Q2  On 

0301 

0.30.2 

0303 

03  On 

.anQi 

On  O2 

On  03 

P 

3 . 

P 

P 

1 

of  n2  products  cp  q^.  Our  goal  will  be  to  find  a simple  formula  for 
Sxj  = ^ Qj  Ok  , 

1 <Ck<^u 


(Now  is  a good 
time  to  do  warmup 

exercises  4 and  6.) 

(Or  to  check  out 
the  Snickers  bar 
languishing  in  the 
freezer.) 
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the  sum  of  all  elements  on  or  above  the  main  diagonal  of  this  array.  Because 
QjOk  = QkQj , the  array  is  symmetrical  about  its  main  diagonal;  therefore 
will  be  approximately  half  the  sum  of  all  the  elements  (except  for  a fudge 
Does  rocky  road  factor  that  takes  account  of  the  main  diagonal). 

have  fudge  in  it?  Such  considerations  motivate  the  following  manipulations.  We  have 

S^q  = X Qj  Qk  ■ X QkQi  “ X aiQk  “ Sc^’ 

l^k^j^n  l^k^j^n 

because  we  can  rename  (j,  k)  as  (k,  j).  Furthermore,  since 

[Kj^k^n]  + [Kk^)^n]  = [1  ^ j,k^n]  + [1  =kjCn]  , 


we  have 


2Sx]  = S'q+S^  = X 


m ak 


L 


Qj  Qk 


1 ^j.ksjn 


1 ^j=k^n 


The  first  sum  is  Qj)  (Xk=i  °k)  = (^£=1  Qk)2)  by  the  general  distribu- 

tive law  (2.28).  The  second  sum  is  Hk=iQk'  Therefore  we  have 


Sxi  - X 

l$j$k$n 


Qj  Qk 


(2-33) 


an  expression  for  the  upper  triangular  sum  in  terms  of  simpler  single  sums. 
Encouraged  by  such  success,  let’s  look  at  another  double  sum: 


S = X (Qk  - Qj)(bk -bj) . 

l$j<k$n 


Again  we  have  symmetry  when  j and  k are  interchanged: 

s = X (Qj  - Qk)(bj  - bk)  = X (Qk-Qj)(bk-bj). 

l$k<j$n  l$k<j$n 


So  we  can  add  S to  itself,  making  use  of  the  identity 

[1  J$j<k^n]  + [1  ^kcj^n]  = [1  ^ ),k^n]  - [1  ^ j = k^n] 
to  conclude  that 

2S  = X (ai  “ Qk)(bi  ^k)  — X (ai  — ak ) (b>j  — bk)  ■ 
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The  second  sum  here  is  zero;  what  about  the  first?  It  expands  into  four 
separate  sums,  each  of  which  is  vanilla  flavored: 


Y Qjbj  - Y a>bk  H akb)  + H akbk 

1$j,k^n  l$j,k$n  l^j.k^n  l$i,k^n 

— 2 ^ Qkbk  2 ^ cijbk 

l^j,k$n  l$j,k^n 

= 2rt  Y Qkbk  " 2(Lak 

l^k$n  'k=1  , — 

In  the  last  step  both  sums  have  been  simplified  according  to  the  general 
distributive  law  (2.28).  If  the  manipulation  of  the  first  sum  seems  mysterious, 
here  it  is  again  in  slow  motion: 

2 Y flkbk  = 2I  I Qkbk 

= 2 Y'  Qkbk  Y_  1 

l^k^n  Kj$n 

= 2 Y akbk^  = 2n  Y Qkbk  ' 


An  index  variable  that  doesn’t  appear  in  the  summand  (here  j)  can  simply 
be  eliminated  if  we  multiply  what’s  left  by  the  size  of  that  variable’s  index 
set  (here  n). 

Returning  to  where  we  left  off,  we  can  now  divide  everything  by  2 and 
rearrange  things  to  obtain  an  interesting  formula: 

( Y Qk)  ( Y bk)  = n.V'akbk-  Y (Qk  ~ aiHbk  - bj) . (2.34) 

'lc=l  ' 'k=l  ' k=l  .1 

This  identity  yields  Chebyshev's  summation  inequalities  as  a special  case: 


n \ / n 


Yak)\Yhk)  ^ n^lakbk,  ifai  ^ ••4anandbi 

k=1  ' xk=l  2 k=1 

n \ / n \ n 

LQk  Lbk  ^ nJlQkbk,  ifai  $---^anandbi 


k=l  7 vk=1 


k=l 


(Chebyshev  actu- 
ally proved  the 
analogous  result 
for  integrals 
instead  of  sums: 

(Jo  f(x)  dx) 

(Jab  fl(x)  dx| 

^ (b  - a) 

. (Jaf(x)gMdx), 
if  f(x)  and  g(x) 
are  monotone 
nondecreasing 
functions.) 


(In  general,  if  aj  ^ ^ a,  and  if  p is  a permutation  of  , n),  it’s 

possible  to  prove  that  the  largest  value  of  ^k=i  Qkbp(k)  occurs  when  bp(i)  ^ 
•••  <C  bp(n),  and  the  smallest  value  occurs  when  bp(i)  ^ . . . dj  bp(n] .) 
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My  other  math 
teacher  calls  this  a 
“bijection";  maybe 
I’ll  learn  to  love 
that  word  some  day. 

And  then  again.  . . 


Multiple  summation  has  an  interesting  connection  with  the  general  op- 
eration of  changing  the  index  of  summation  in  single  sums.  We  know  by  the 
commutative  law  that 

X_0-k  ~ ap(k)  I 

k£K  p(k)6K 

if  p(k)  is  any  permutation  of  the  integers.  But  what  happens  when  we  replace 
k by  f(j),  where  f is  an  arbitrary  function 

f:  J — * K 


that  takes  an  integer  j € J into  an  integer  f(j)  € K?  The  general  formula  for 
index  replacement  is 

X.af(i)  = (2-35) 

jej  keK 

where  #f-(k)  stands  for  the  number  of  elements  in  the  set 

f-(k)  = {j  If  (j)  = k}  > 


that  is,  the  number  of  values  of  j € J such  that  f(j)  equals  k. 

It’s  easy  to  prove  (2.35)  by  interchanging  the  order  of  summation, 

Haf(j)=  ^ak[f(j)=k]=  X.Qk  ^[f(i)  = k]. 

161  j€l  keK  jel 

K 

since  ^,jej[f(j)  =k]  = #f-(k).  In  the  special  case  that  f is  a one-to-one 
correspondence  between  J and  K,  we  have  #f~(k)  = 1 for  all  k,  and  the 
general  formula  (2.35)  reduces  to 

Xaf(i)  ~ T.  Qf(i)  = X.ak’ 

lei  f(j)6K  k£K 


This  is  the  commutative  law  (2.17)  we  had  before,  slightly  disguised. 

Our  examples  of  multiple  sums  so  far  have  all  involved  general  terms  like 
Qk  or  t>k-  But  this  book  is  supposed  to  be  concrete,  so  let’s  take  a look  at  a 
multiple  sum  that  involves  actual  numbers: 


S 


n 


L 

l$j<k$n 


1 

k3! 


For  example,  Si  = 0;  S2  = 1 ; S3  = + l~r  + 3^2  = 2 • 
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The  normal  way  to  evaluate  a double  sum  is  to  sum  first  on  j or  first 
on  k,  so  let’s  explore  both  options. 


Sn  = 


I Iq 

ia$n  l^j<k  J 

L L I 

1$k$n  1 $k-j<k  ' 

L L } 

Kk^n  0<j$k  T ’ 

X Hk-1 

l$ks:n 

X Hk 

l$k+1^n 

X hk. 

0^k<n 


summing  first  on  j 
replacing  j by  k — j 
simplifying  the  bounds  on  j 
by  (2.13),  the  definition  of  1 
replacing  k by  k + 1 
simplifying  the  bounds  on  k 


Alas!  We  don’t  know  how  to  get  a sum  of  harmonic  numbers  into  closed  form.  Get  out  the  whip, 
If  we  try  summing  first  the  other  way,  we  get 


s"  = Z Ir 


1$j$n  j<k$n 


k-j 


- Z L I 


l^j^n  j<k+j^n 


= I L i 

l$j^n  0<k$u-i 

= x Hn-t 

l$j§n 

- L H- 

1 ^nj^n 

= L», 

0$j<n 


summing  first  on  k 
replacing  k by  k + j 
simplifying  the  bounds  on  k 
by  (2.13),  the  definition  of  Hn^j 
replacing  j by  n — j 
simplifying  the  bounds  on  j 


We’re  back  at  the  same  impasse. 

But  there’s  another  way  to  proceed,  if  we  replace  k by  k + j before 
deciding  to  reduce  Sn  to  a sum  of  sums: 


s-  L ^ 

1 ^j<k^n 


L r 


1 $j<k+j$n 


recopying  the  given  sum 
replacing  k by  k + j 
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L L 

l^k^n 

Lrt  — k 

w 


1 


1 Sk$TV 


*1-1' 

ia$n  ia$n 


n 


z 


1 


= nHn  — n. 


— n 


summing  first  on  j 
the  sum  on  j is  trivial 
by  the  associative  law 
by  gosh 

by  (2.13),  the  definition  of  H„ 


It  was  smart  to  say 
k <L  n instead  of 
k <L  n - 1 in  this 
derivation.  Simple 
bounds  save  energy. 


Aha!  We’ve  found  S,.  Combining  this  with  the  false  starts  we  made  gives  us 
a further  identity  as  a bonus: 

Z Hk  = nHn  — n (2.36) 

O^kcn 


We  can  understand  the  trick  that  worked  here  in  two  ways,  one  algebraic 
and  one  geometric.  (1)  Algebraically,  if  we  have  a double  sum  whose  terms  in- 
volve k+f(  j),  where  f is  an  arbitrary  function,  this  example  indicates  that  it’s 
a good  idea  to  try  replacing  k by  k-f(j)  and  summing  on  j.  (2)  Geometrically, 
we  can  look  at  this  particular  sum  Sn  as  follows,  in  the  case  n = 4: 


j = 1 
J=2 

i = 3 
)=4 


1 k=2k=3k=4 


1 

3 

1 

2 


Our  first  attempts,  summing  first  on  j (by  columns)  or  on  k (by  rows),  gave 
us  Hi  + H2  + H3  = H3  + H2  + Hi . The  winning  idea  was  essentially  to  sum 
by  diagonals,  getting  y + 4 + j- 


2.5  GENERAL  METHODS 

Now  let’s  consolidate  what  we’ve  learned,  by  looking  at  a single 
example  from  several  different  angles.  On  the  next  few  pages  we’re  going  to 
try  to  find  a closed  form  for  the  sum  of  the  first  n squares,  which  we’ll  call  □n: 

□u  = Z k2 1 for  n ^ 0.  (2-37) 

O^k^n 

We’ll  see  that  there  are  at  least  seven  different  ways  to  solve  this  problem, 
and  in  the  process  we’ll  learn  useful  strategies  for  attacking  sums  in  general. 
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First,  as  usual,  we  look  at  some  small  cases, 
rt 


n 1 

0123456  0 

1 4 9 16  25 

36 

1)7 

fli 

11  ! 

ltt  1# 

111  11 

111  12 

Q ' 

0 1 5 

14  30  55 

91 

140 

204 

285 

385 

506 

650 

No  closed  form  for  0n  is  immediately  evident;  but  when  we  do  find  one,  we 
can  use  these  values  as  a check. 


Method  0:  You  could  look  it  up. 

A problem  like  the  sum  of  the  first  n squares  has  probably  been  solved 
before,  so  we  can  most  likely  find  the  solution  in  a handy  reference  book. 
Sure  enough,  page  72  of  the  CRC  Standard  Mathematical  Tables  [24]  has  the 
answer: 


□ _ n(n+l)(2n+l) 


for  n ^ 0. 


(2.38) 


Just  to  make  sure  we  haven’t  misread  it,  we  check  that  this  formula  correctly 
gives  Qj  = 5-6-11/6  =55.  Incidentally,  page  72  of  the  CRC  Tables  has 
further  information  about  the  sums  of  cubes,  . . . , tenth  powers. 

The  definitive  reference  for  mathematical  formulas  is  the  Handbook  of 
Mathematical  Functions,  edited  by  Abramowitz  and  Stegun  [2],  Pages  813- 
814  of  that  book  list  the  values  of  []n  for  n <7  100;  and  pages  804  and  809 
exhibit  formulas  equivalent  to  (2.38),  together  with  the  analogous  formulas 
for  sums  of  cubes,  . . . , fifteenth  powers,  with  or  without  alternating  signs. 

But  the  best  source  for  answers  to  questions  about  sequences  is  an  amaz- 
ing little  book  called  the  Handbook  of  Integer  Sequences,  by  Sloane  [270], 
which  lists  thousands  of  sequences  by  their  numerical  values.  If  you  come 
up  with  a recurrence  that  you  suspect  has  already  been  studied,  all  you  have 
to  do  is  compute  enough  terms  to  distinguish  your  recurrence  from  other  fa- 
mous ones;  then  chances  are  you’ll  find  a pointer  to  the  relevant  literature  in 
Sloane’s  Handbook.  For  example,  1,  5,  14,  30,  . . . turns  out  to  be  Sloane’s 
sequence  number  1574,  and  it’s  called  the  sequence  of  “square  pyramidal 
numbers’’  (because  there  are  Dn  balls  in  a pyramid  that  has  a square  base  of 
rr2  balls).  Sloane  gives  three  references,  one  of  which  is  to  the  handbook  of 
Abramowitz  and  Stegun  that  we’ve  already  mentioned. 

Still  another  way  to  probe  the  world’s  store  of  accumulated  mathematical 
wisdom  is  to  use  a computer  program  (such  as  MACSYMA)  that  provides 
tools  for  symbolic  manipulation.  Such  programs  are  indispensable,  especially 
for  people  who  need  to  deal  with  large  formulas. 

It’s  good  to  be  familiar  with  standard  sources  of  information,  because 
they  can  be  extremely  helpful.  But  Method  0 isn’t  really  consistent  with  the 
spirit  of  this  book,  because  we  want  to  know  how  to  figure  out  the  answers 


(Harder  sums 
can  be  found 
in  Hansen’s 
comprehensive 
table  [147].) 
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Or,  at  least  to 
problems  having 
the  same  answers 
as  problems  that 
other  people  have 
decided  to  consider. 


X \ 

' \ 

by  ourselves.  (The  look-up  method  is  limited  to  problems  that  other  people 
have  decided  are  worth  considering;  a new  problem  won’t  be  there. 

Method  1:  Guess  the  answer,  prove  it  by  induction. 

Perhaps  a little  bird  has  told  us  the  answer  to  a problem,  or  we  have 
arrived  at  a closed  form  by  some  other  less-than-rigorous  means.  Then  we 
merely  have  to  prove  that  it  is  correct. 

We  might,  for  example,  have  noticed  that  the  values  of  nn  have  rather 
small  prime  factors,  so  we  may  have  come  up  with  formula  (2.38)  as  something 
that  works  for  all  small  values  of  n.  We  might  also  have  conjectured  the 
equivalent  formula 


□ 


n 


n(n+  i)(n+  1) 

3 


for  n j>  0, 


(2-39) 


which  is  nicer  because  it’s  easier  to  remember.  The  preponderance  of  the 
evidence  supports  (2.39),  but  we  must  prove  our  conjectures  beyond  all  rea- 
sonable doubt.  Mathematical  induction  was  invented  for  this  purpose. 

“Well,  Your  Honor,  we  know  that  □o  = 0=  0(0+ j)(0  + 1)/3,  so  the  basis 
is  easy.  For  the  induction,  suppose  that  n > 0,  and  assume  that  (2.39)  holds 
when  n is  replaced  by  n — 1.  Since 


Dn  — C3n  -1  + Tl  , 


we  have 

3D„  = (n-  1 ) ( n - t)(n)  + 3n2 
= (n3  — |n2  + in)  + 3n2 
= (u3  + jn2  + in) 

= n(n+  i)(n+  1). 

Therefore  (2.39)  indeed  holds,  beyond  a reasonable  doubt,  for  all  n ^ 0.” 
Judge  Wapner,  in  his  infinite  wisdom,  agrees. 

Induction  has  its  place,  and  it  is  somewhat  more  defensible  than  trying 
to  look  up  the  answer.  But  it’s  still  not  really  what  we’re  seeking.  All  of 
the  other  sums  we  have  evaluated  so  far  in  this  chapter  have  been  conquered 
without  induction;  we  should  likewise  be  able  to  determine  a sum  like  Qn 
from  scratch.  Flashes  of  inspiration  should  not  be  necessary.  We  should  be 
able  to  do  sums  even  on  our  less  creative  days. 

Method  2:  Perturb  the  sum. 

So  let’s  go  back  to  the  perturbation  method  that  worked  so  well  for  the 
geometric  progression  (2.25).  We  extract  the  first  and  last  terms  of  □ n+1  in 


44  SUMS 


order  to  get  an  equation  for  Dn: 

□ ,+(n+l)’  = (k+1)2  = Y-  (k2  + 2k  + 1) 

0$2k$5n  O^k^n 

= Y k2+2  X k+  L 1 

O^k^n  O^k^n  O^k^ri 

— Dn  + 2 Y_  k + (H-  + 1)  ■ 

O^k^n 

Oops-  the  □n’s  cancel  each  other.  Occasionally,  despite  our  best  efforts,  the 
perturbation  method  produces  something  like  Qn  = □n)  so  we  lose. 

On  the  other  hand,  this  derivation  is  not  a total  loss;  it  does  reveal  a way 
to  sum  the  first  n integers  in  closed  form, 

2 Y k = (n  + l)2-(n  + l), 

0$k$n 

even  though  we’d  hoped  to  discover  the  sum  of  first  integers  squared.  Could 
it  be  that  if  we  start  with  the  sum  of  the  integers  cubed,  which  we  might 
call  ®n,  we  will  get  an  expression  for  the  integers  squared?  Let’s  try  it. 

®n  + (n  + 1 )3  = Y (k+D3  = Y{  k3+3k2  + 3k+1) 

O^k^n.  0^k<:n 

— + 3nn  + 3 ^ f (t4+l ) . 

Sure  enough,  the  dpn’s  cancel,  and  we  have  enough  information  to  determine 
□n  without  relying  on  induction: 

3Dn  = (n+1)3  — 3(n+1)n/2-(n+1| 

= (n  + 1 )(n2  + 2n  + 1 — |n  - 1 ) = (n  + 1 )(n  + l)n . 

Method  3:  Build  a repertoire. 

A slight  generalization  of  the  recurrence  (2.7)  will  also  suffice  for  sum- 
mands involving  n2.  The  solution  to 

Ro  = a; 

Rn  = Rn-i  + (3  + yn-t-  6n2  , forn>0,  (2.40) 

will  be  of  the  general  form 

Rn  = A(n)a-|-B(n)|3  + C(n)y + D(n)6;  (2.41) 


Seems  more  like  a 

draw. 


Method  2’: 
Perturb  your  TA. 


and  we  have  already  determined  A(n),  B(n),  and  C(n),  because  (2.41)  is  the 
same  as  (2.7)  when  6 = 0.  If  we  now  ping  in  = n3 , we  find  that  n3  is  the 
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solution  when  a = 0,  (3  = 1,7  = ■ 3 , 6 = 3,  Hence 
3D(n)  — 3C(n)  + B(n)  = rt3  ; 
this  determines  D(n). 

We’re  interested  in  the  sum  nn  which  equals  □ -1  + ri2;  thus  we  get 

□n  = Rn  if  we  set  a = 3 = y = 0 and  6 = 1 in  (2.41).  Consequently 
El,  = D(n).  We  needn’t  do  the  algebra  to  compute  D(n)  from  B(n)  and 
C(n),  since  we  already  know  what  the  answer  will  be;  but  doubters  among  us 
should  be  reassured  to  find  that 

3D(n)  = n3+3C(n)-B(n|=  u3  + 3^^-n  _ n(n+i  )(n+i) , 

Method  4:  Replace  sums  by  integrals. 

People  who  have  been  raised  on  calculus  instead  of  discrete  mathematics 
tend  to  be  more  familiar  with  J than  with  so  they  find  it  natural  to  try 
changing  Y_  t0  J-  One  of  our  goals  in  this  book  is  to  become  so  comfortable 
with  that  we’ll  think  J is  more  difficult  than  ^ (at  least  for  exact  results). 
But  still,  it’s  a good  idea  to  explore  the  relation  between  ^ and  J,  since 
summation  and  integration  are  based  on  very  similar  ideas. 

In  calculus,  an  integral  can  be  regarded  as  the  area  under  a curve,  and  we 
can  approximate  this  area  by  adding  up  the  areas  of  long,  skinny  rectangles 
that  touch  the  curve.  We  can  also  go  the  other  way  if  a collection  of  long, 
skinny  rectangles  is  given:  Since  Dn  is  the  sum  of  the  areas  of  rectangles 
whose  sizes  are  1 x 1 , 1 X 4,  , , , , 1 x n2,  it  is  approximately  equal  to  the  area 
under  the  curve  f ( X ) = x2  between  0 and  n. 


The  horizontal  scale 
here  is  ten  times  the 
vertical  scale. 


The  area  under  this  curve  is  J™  x2  dx  = n3/3;  therefore  we  know  that  Dn  is 
approximately  j n3 . 


46  SUMS 


One  way  to  use  this  fact  is  to  examine  the  error  in  the  approximation, 
En  = Dn  — In3.  Since  □ n satisfies  the  recurrence  []n  = □n_1  + rt2 , we  find 
that  En  satisfies  the  simpler  recurrence 

En  - Dn  - jTi3  = nn_-|  + n2-yn3  _ En-!  + |(n- I)3 + n2  ~ jn.3 

= En_!  + n - j , 

Another  way  to  pursue  the  integral  approach  is  to  find  a formula  for  En  by 
summing  the  areas  of  the  wedge-shaped  error  terms.  We  have 


□ n- 


dx  = ( k2  “ 


k=1 


= LK- 


k=l 


x2  dx 

k-1 

k3  - (k-  l)3 

3 


Either  way,  we  could  find  Eu  and  then  Dn. 


L(k-i) 


This  is  for  people 
addicted  to  calculus. 


Method  5:  Expand  and  contract. 

Yet  another  way  to  discover  a closed  form  for  Cl,  is  to  replace  the  orig- 
inal sum  by  a seemingly  more  complicated  double  sum  that  can  actually  be 
simplified  if  we  massage  it  properly: 


= L L k 


(n  — ) + 1) 


= 1 Y_  (nfn+D  + j-j2) 

= lrt2(n+ 1)  + irt(n+ 1)  - ^Dn  = ln(n+i)(n  + l ) - ±Dn  . 


Going  from  a single  sum  to  a double  sum  may  appear  at  first  to  be  a backward 
step,  but  it’s  actually  progress,  because  it  produces  sums  that  are  easier  to 
work  with.  We  can’t  expect  to  solve  every  problem  by  continually  simplifying, 
simplifying,  and  simplifying:  You  can’t  scale  the  highest  mountain  peaks  by 
climbing  only  uphill! 


[The  last  step  here 
is  something  like 
the  last  step  of 
the  perturbation 
method,  because 
we  get  an  equation 
with  the  unknown 
quantity  on  both 
sides.) 


Method  6:  Use  finite  calculus. 

Method  7:  Use  generating  functions. 

Stay  tuned  for  still  more  exciting  calculations  of  [Hn  = ^,£=0  k2,  as  we 
learn  further  techniques  in  the  next  section  and  in  later  chapters. 


2.6  FINITE  AND  INFINITE  CALCULUS  47 


As  opposed  to  a 
cassette  function. 


Math  power. 


2.6  FINITE  AND  INFINITE  CALCULUS 

We’ve  learned  a variety  of  ways  to  deal  with  sums  directly.  Now  it’s 
time  to  acquire  a broader  perspective,  by  looking  at  the  problem  of  summa- 
tion from  a higher  level.  Mathematicians  have  developed  a “finite  calculus,” 
analogous  to  the  more  traditional  infinite  calculus,  by  which  it’s  possible  to 
approach  summation  in  a nice,  systematic  fashion. 

Infinite  calculus  is  based  on  the  properties  of  the  derivative  operator  D, 
defined  by 


Finite  calculus  is  based  on  the  properties  of  the  difference  operator  A,  defined 

by 


Af(x)  = f ( x + 1)  - f ( x)  . (2.42) 

This  is  the  finite  analog  of  the  derivative  in  which  we  restrict  ourselves  to 
positive  integer  values  of  h.  Thus,  h = 1 is  the  closest  we  can  get  to  the 
“limit”  as  h — » 0,  and  Af(x)  is  the  value  of  ( f ( x + h)  — f(x))/h  when  h = 1 . 

The  symbols  D and  A are  called  operators  because  they  operate  on 
functions  to  give  new  functions;  they  are  functions  of  functions  that  produce 
functions.  If  f is  a suitably  smooth  function  of  real  numbers  to  real  numbers, 
then  Df  is  also  a function  from  reals  to  reals.  And  if  f is  any  real-to-real 
function,  so  is  Af.  The  values  of  the  functions  Df  and  Af  at  a point  x are 
given  by  the  definitions  above. 

Early  on  in  calculus  we  learn  how  D operates  on  the  powers  f ( x)  = xm. 
In  such  cases  Df  ( x ) = We  can  write  this  informally  with  f omitted, 

D(xm)  = mx™'1  . 

It  would  be  nice  if  the  A operator  would  produce  an  equally  elegant  result; 
unfortunately  it  doesn’t.  We  have,  for  example, 

A(x3)  = (x+1)3~X3  = 3 x 2 + 3 x + 1 . 

But  there  is  a type  of  “mth  power”  that  does  transform  nicely  under  A, 
and  this  is  what  makes  finite  calculus  interesting.  Such  newfangled  mth 
powers  are  defined  by  the  rule 

m factors 

X—  = x(x  — 1 ) . . . (x  — m + 1 ) , integer  m ^ 0.  (2.43) 

Notice  the  little  straight  line  under  the  m;  this  implies  that  the  m factors 
are  supposed  to  go  down  and  down,  stepwise.  There’s  also  a corresponding 
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definition  where  the  factors  go  up  and  up: 

TTl  factors 

' ' ' f ^ : 

xMI  = x(x  + 1) ...  (x  + m — 1) , integer  m^O.  (2-44) 

When  m = 0,  we  have  x-  = x01  = 1,  because  a product  of  no  factors  is 
conventionally  taken  to  be  1 (just  as  a sum  of  no  terms  is  conventionally  0). 

The  quantity  x—  is  called  “x  to  the  m falling,”  if  we  have  to  read  it 
aloud;  similarly,  xm  is  “x  to  the  m rising!’  These  functions  are  also  called 
falling  factorial  powers  and  rising  factorial  powers,  since  they  are  closely 
related  to  the  factorial  function  n!  = n(n  — 1).  . . (1).  In  fact,  n!  = ri-  = 1n. 

Several  other  notations  for  factorial  powers  appear  in  the  mathematical 
literature,  notably  “Pochhammer’s  symbol”  (x),  for  xm  or  x— ; notations 
like  x(ml  or  X(m]  are  also  seen  for  x— . But  the  underline/overline  convention 
is  catching  on,  because  it’s  easy  to  write,  easy  to  remember,  and  free  of 
redundant  parentheses. 

Falling  powers  x—  are  especially  nice  with  respect  to  A.  We  have 
A(x— ) = (x  + 1p-xm 

= (x  + 1 )x . . .(x-m  + + ) - x . . . (x  — m T 2) (x  — m T 1 ) 

= mx(x  — 1 ) . . . (x  — m + 2) , 

hence  the  finite  calculus  has  a handy  law  to  match  D(xm)  = mxm_1: 

A(x— ) = mxl^.  (2.45) 


This  is  the  basic  factorial  fact. 

The  operator  D of  infinite  calculus  has  an  inverse,  the  anti-derivative 
(or  integration)  operator  J.  The  Fundamental  Theorem  of  Calculus  relates  D 

10  J: 


g(x)  = Df(x)  if  and  only  if 


g(x)  dx  = f(x)  + C. 


Here  J g(x)  dx,  the  indefinite  integral  of  g(x),  is  the  class  of  functions  whose 
derivative  is  g(x).  Analogously,  A has  as  an  inverse,  the  anti-difference  (or 
summation)  operator  ]T;  and  there’s  another  Fundamental  Theorem: 

g(x)  = Af(x)  if  and  only  if  ^g(x)6x  = f(x)  + C.  (2.46) 

Here  Y_  g(x)  6x,  the  indefinite  sum  of  g(x),  is  the  class  of  functions  whose 
difference  is  g(x).  (Notice  that  the  lowercase  6 relates  to  uppercase  A as 
d relates  to  D.)  The  “C”  for  indefinite  integrals  is  an  arbitrary  constant;  the 
“C”  for  indefinite  sums  is  any  function  p(x)  such  that  p(x+  1)  = p(x).  For 


Mathematical 
terminology  is 
sometimes  crazy: 
Pochhammer  [234] 
actually  used  the 
notation  (x)  m 
for  the  binomial 
coefficient  (x)  , not 
for  factorial  powers. 


“Quemadmodum 

ad  differential n 
denotandam  usi 
sumus  signo  A, 
ita  summam  jndi- 
cabimus  signo  L. 

...  ex  quo  xquatio 
z = Ay , si  inver- 
tatur,  dabit  quoque 
y = Iz  + C.” 

— L.  Euler  [88] 
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You  call  this  a 
punch  line? 


example,  C might  be  the  periodic  function  a + b sin27rx;  such  functions  get 
washed  out  when  we  take  differences,  just  as  constants  get  washed  out  when 
we  take  derivatives.  At  integer  values  of  x,  the  function  C is  constant. 

Now  we’re  almost  ready  for  the  punch  line.  Infinite  calculus  also  has 
definite  integrals:  If  g(x)  = Df(x),  then 


fb 

g(x)  dx  = 
* o 


f(b)  - f ( a ) . 


Therefore  finite  calculus-ever  mimicking  its  more  famous  cousin-  has  def- 
inite sums:  If  g(x)  = Af(x),  then 


Y g(x)  6x  = f(x) 

l — a,  a 


= f(b)  -f (a) - 


(2-47) 


This  formula  gives  a meaning  to  the  notation  X.o  g(x)  6x,  just  as  the  previous 

i»b 

formula  defines  J g(x)  dx. 

But  what  does  g(x)  6x  really  mean,  intuitively?  We’ve  defined  it  by 
analogy,  not  by  necessity.  We  want  the  analogy  to  hold,  so  that  we  can  easily 
remember  the  rules  of  finite  calculus;  but  the  notation  will  be  useless  if  we 
don’t  understand  its  significance.  Let’s  try  to  deduce  its  meaning  by  looking 
first  at  some  special  cases,  assuming  that  g(x)  = Af(x)  = f(x  + 1)  -f(x).  If 
b = a,  we  have 


Vag(x)6x  = f (a)-f  (a)  = 0. 

Q 

Next,  if  b = a + 1 , the  result  is 


6x  = f ( q T 1)  -f (a)  = g(a). 


More  generally,  if  b increases  by  1 , we  have 


. b+i 


Ll/  J I ■ Cl 

g(x)5x  - ) g(x) 

a ^ — a 


6x 


(f(b  + 1)  -f(a))  (f(b)  -f(a)) 

f(b+  1)  -f(b)  = g(b). 


These  observations,  and  mathematical  induction,  allow  us  to  deduce  exactly 
what  g(x)  6x  means  in  general,  when  a and  b are  integers  with  b a: 

b b_1 

Xu9W6x  = = Z g(k) , for  integers  b ^ a.  (2.48) 

k=a  a$k<b 


In  other  words,  the  definite  sum  is  the  same  as  an  ordinary  sum  with  limits, 
but  excluding  the  value  at  the  upper  limit. 
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Let’s  try  to  recap  this;  in  a slightly  different  way.  Suppose  we’ve  been 
given  an  unknown  sum  that’s  supposed  to  be  evaluated  in  closed  form,  and 
suppose  we  can  write  it  in  the  form  Y_  n<k<b  = 21a  gW  6x-  The  theory 
of  finite  calculus  tells  us  that  we  can  express  the  answer  as  f(b)  — f(a),  if 
we  can  only  find  an  indefinite  sum  or  anti-difference  function  f such  that 
g(x)  = f (x  + 1)  — f(x).  One  way  to  understand  this  principle  is  to  write 
Zla<k<b  g®  out  in  full,  using  the  three-dots  notation: 

Y_  (f(k+1J  - f ( k ) ) = (f(Q+l)  -f(a))  + (f(o+2)  -f(a+D)  -f 

a^k<b 

+ (f(b-l)  - f(b-2))  + (f(b)  - f(b-l))  . 

Everything  on  the  right-ha:nd  side  cancels,  except  f(b)  — f(a);  so  f(b)  f(a) 
is  the  value  of  the  sum.  (Sums  of  the  form  X!a<k<b(^(k  + 1)  — f(k))  are 
often  called  telescoping,  by  analogy  with  a collapsed  telescope,  because  the 
thickness  of  a collapsed  telescope  is  determined  solely  by  the  outer  radius  of 
the  outermost  tube  and  the  inner  radius  of  the  innermost  tube.) 

But  rule  (2.48)  applies  only  when  b ^ a;  what  happens  if  b < a?  Well, 
(2.47)  says  that  we  must  have 

Y_n,  gW  6x  = f(b)  -f(a) 

= -(f(a)-f(b))  = -V  g(x)6x. 

L 0 

This  is  analogous  to  the  corresponding  equation  for  definite  integration.  A 
similar  argument  proves  ia+n=rQ  , the  summation  analog  of  the  iden- 
tity Ju  + Jb  = In  ful1  garb’ 


Y b g(x)  6x  + y g(x)  6x  = y_ c g(x)  6x , (2.49) 

for  all  integers  a,  b,  and  c. 

At  this  point  a few  of  us  are  probably  starting  to  wonder  what  all  these 
parallels  and  analogies  buy  us.  Well  for  one,  definite  summation  gives  us  a 
simple  way  to  compute  sums  of  falling  powers:  The  basic  laws  (2.45),  (2.47), 
and  (2.48)  imply  the  general  law 


y k— 


0$k<n 


m+4  |n 
0 


m + 1 


n 


m+l 


m+  1 


for  integers  rn,  n 0. 


(2-50) 


And  all  this  time 
I thought  it  was 
telescoping  because 
it  collapsed  from  a 
very  long  expression 
to  a very  short  one. 


Others  have  been 
wondering  this  for 
some  time  now. 


This  formula  is  easy  to  remember  because  it’s  so  much  like  the  familiar 
Jo  xmdx=  nm+,/(m+  1). 
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In  particular,  when  m = 1 we  have  k-  = k,  so  the  principles  of  finite 
calculus  give  us  an  easy  way  to  remember  the  fact  that 

Y_  k = y = n(n  - 1 )/2 

0$k<n 

The  definite-sum  method  also  gives  us  an  inkling  that  sums  over  the  range 
0 ^ k < n often  turn  out  to  be  simpler  than  sums  over  1 k n;  the  former 
are  just  f(n)  — f (0),  while  the  latter  must  be  evaluated  as  f (n  + 1)  — f ( 1) 
Ordinary  powers  can  also  be  summed  in  this  new  way,  if  we  first  express 
them  in  terms  of  falling  powers.  For  example, 

k2  = k-  + kk , 

hence 

Y k1  - ~ + = jn(n  — 1 )(n  - 2 + |)  = jn(n  - \ ) (n  - 1 ) . 

0$k<n 

With  friends  like 

this . . 


k3  = kk  + Sk^  + k1. 


Replacing  ft  by  n + 1 gives  us  yet  another  way  to  compute  the  value  of  our 
old  friend  □ n=Eo<k<n^'n  closed  form. 

Gee,  that  was  pretty  easy.  In  fact,  it  was  easier  than  any  of  the  umpteen 
other  ways  that  beat  this  formula  to  death  in  the  previous  section.  So  let’s 
try  to  go  up  a notch,  from  squares  to  cubes:  A simple  calculation  shows  that 


(It’s  always  possible  to  convert  between  ordinary  powers  and  factorial  powers 
by  using  Stirling  numbers,  which  we  will  study  in  Chapter  6.)  Thus 


Y_  k3 

a^kcb 


kl  ,3  kk 

7 + k +7 


Falling  powers  are  therefore  very  nice  for  sums.  But  do  they  have  any 
other  redeeming  features?  Must  we  convert  our  old  friendly  ordinary  powers 
to  falling  powers  before  summing,  but  then  convert  back  before  we  can  do 
anything  else?  Well,  no,  it’s  often  possible  to  work  directly  with  factorial 
powers,  because  they  have  additional  properties.  For  example,  just  as  we 
have  (x  + y)2  = x2  + 2xy  + y2,  it  turns  out  that  (x  + y )-  = x-  + 2x^yl  + y^, 
and  the  same  analogy  holds  between  (x  + y)”  and  (x  + y)— . (This  “factorial 
binomial  theorem’’  is  proved  in  exercise  5.37.) 

So  far  we’ve  considered  only  falling  powers  that  have  nonnegative  expo- 
nents. To  extend  the  analogies  with  ordinary  powers  to  negative  exponents, 
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we  need  an  appropriate  definition  of  x—  for  m < 0.  Looking  at  the  sequence 
x(x-  1)(x  — 2) , 

X ( X - I ) , 

X, 

1, 

we  notice  that  to  get  from  x-  to  x-  to  x-  to  x-  we  divide  by  x 2,  then 
by  x — 1,  then  by  x.  It  seems  reasonable  (if  not  imperative)  that  we  should 
divide  by  x + 1 next,  to  get  from  x-  to  x— , thereby  making  x—  = 1 /(x  + I ). 
Continuing,  the  first  few  negative-exponent  falling  powers  are 

1 

x + 1 ' 

1 

(x+l)(x  + 2)"’ 


(x  + 1)(x  + 2)(x  + 3)  ’ 
and  our  general  definition  for  negative  falling  powers  is 

(x  + 1 )(x  + 2) . . . (x  + m)  y b J 

(It’ s also  possible  to  define  falling  powers  for  real  or  even  complex  m,  but  we 
will  defer  that  until  Chapter  5.) 

With  this  definition,  falling  powers  have  additional  nice  properties.  Per- 
haps the  most  important  is  a general  law  of  exponents,  analogous  to  the  law 

xm+n  = xmxn 


for  ordinary  powers.  The  falling-power  version  is 


xm+n  _ x—  (x  — tn)—  , 


integers  m and  n. 


(2.52) 


For  example,  x^ii  = x-  (x  2)-;  and  with  a negative  n we  have 


X—  = x-(x- 


■ 2)—  = X(x-  1) 


(x-  l)x(x+  i)  x + 1 


= X- 


If  we  had  chosen  to  define  x—  as  I / x instead  of  as  1 /(x  + 1 ),  the  law  of 
exponents  (2.52)  would  have  failed  in  cases  like  m = - 1 and  n = 1.  In  fact, 
we  could  have  used  (2.52)  to  tell  us  exactly  how  falling  powers  ought  to  be 
defined  in  the  case  of  negative  exponents,  by  setting  m = -n.  When  an 
existing  notation  is  being  extended  to  cover  more  cases,  it’s  always  best  to 
formulate  definitions  in  such,  a way  that  general  laws  continue  to  hold. 


How  can  a complex 
number  be  even? 


Laws  have  their 
exponents  and  their 
detractors. 
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Now  let’s  make  sure  that  the  crucial  difference  property  holds  for  our 
newly  defined  falling  powers.  Does  Ax—  = rax^— !■  when  m < O?  If  m = -2, 
for  example,  the  difference  is 


2 _ | ‘ 

“ (x  + 2)(x  + 3)  " (x  + 1)(x  + 2) 

(x+  1)  — (x  + 3) 

= (x  + 1 ) (x  + 2)  (x  + 3) 

= — 2x— , 


Yes  -it  works!  A similar  argument  applies  for  all  m < 0. 

Therefore  the  summation  property  (2.50)  holds  for  negative  falling  powers 
as  well  as  positive  ones,  as  long  as  no  division  by  zero  occurs: 


xB±l  b 

TU+1  a 


for  m f - 1 


But  what  about  when  m = —1?  Recall  that  for  integration  we  use 


f i b 

x dx  = lnx 

Ja  a 

when  m = ■ 1 . We’d  like  to  have  a finite  analog  of  lnx;  in  other  words,  we 
seek  a function  f ( X ) such  that 


x—  = 


x+1 


= Af  (x)  = f ( x + 1 ) - f ( x ] 


It’s  not  too  hard  to  see  that 


fix) 


1 1 

T + 2 + ' 


+ 


1 

x 


0,577  exactly? 
Maybe  they  mean 

1/V3. 

Then  again, 
maybe  not, 


is  such  a function,  when  x is  an  integer,  and  this  quantity  is  just  the  harmonic 
number  Hx  of  (2.13).  Thus  Hx  is  the  discrete  analog  of  the  continuous  lnx. 
(We  will  define  Hx  for  noninteger  x in  Chapter  6,  but  integer  values  are  good 
enough  for  present  purposes.  We’ll  also  see  in  Chapter  9 that,  for  large  x,  the 
value  of  Hx  — In  x is  approximately  0.577  + 1 /(2x).  Hence  Hx  and  In  x are  not 
only  analogous,  their  values  usually  differ  by  less  than  1 . ) 

We  can  now  give  a complete  description  of  the  sums  of  falling  powers: 


xS±l  |b 

m+ 1 L ’ 

|b 


i f m f - 1 ; 


H 


i f m = - 1 . 


(2-53) 
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This  formula  indicates  why  harmonic  numbers  tend  to  pop  up  in  the  solutions 
to  discrete  problems  like  the  analysis  of  quicksort,  just  as  so-called  natural 
logarithms  arise  naturally  in  the  solutions  to  continuous  problems. 

Now  that  we’ve  found  an  analog  for  lnx,  let’s  see  if  there’s  one  for  ex. 
What  function  f(x)  has  the  property  that  Af(x)  = f(x),  corresponding  to  the 
identity  De”  = e”?  Easy: 


f(x+l)-f(X)  = f ( x ) 4=4  f(x+  1)  = 2f(x)  i 

so  we’re  dealing  with  a simple  recurrence,  and  we  can  take  f(x)  = 2”  as  the 
discrete  exponential  function. 

The  difference  of  cx  is  also  quite  simple,  for  arbitrary  c,  namely 
A(cx)  = cx+1  cx  = (c  - l)cx. 


Hence  the  anti-difference  of  cx  is  cx/(c  — 1 ) , if  c ^ 1,  This  fact,  together  with 
the  fundamental  laws  (2.47)  and  (2.48),  gives  us  a tidy  way  to  understand  the 
general  formula  for  the  sum  of  a geometric  progression: 


L <l 


cb  — ca 
c — 1 


for  c ^ 1. 


Every  time  we  encounter  a function  f that  might  be  useful  as  a closed 
form,  we  can  compute  its  difference  Af  = g;  then  we  have  a function  g whose 
indefinite  sum  ]T  g(x)  6x  is  known.  Table  55  is  the  beginning  of  a table  of  ‘Table  55’  is  or 
difference/anti-difference  pairs  useful  for  summation.  Pa§e  ~X~X'  ^et  ^ 

Despite  all  the  parallels  between  continuous  and  discrete  math,  some 
continuous  notions  have  no  discrete  analog.  For  example,  the  chain  rule  of 
infinite  calculus  is  a handy  rule  for  the  derivative  of  a function  of  a function; 
but  there’s  no  corresponding  chain  rule  of  finite  calculus,  because  there’s  no 
nice  form  for  Af  (g  (x))  . Discrete  change-of-variables  is  hard,  except  in  certain 
cases  like  the  replacement  of  x by  c ± x 

However,  A(f(x)  g(x))  does  have  a fairly  nice  form,  and  it  provides  us 
with  a rule  for  summation  by  parts,  the  finite  analog  of  what  infinite  calculus 
calls  integration  by  parts.  Let’s  recall  that  the  formula 


D(uv)  = uDv  + vDu 


of  infinite  calculus  leads  to  the  rule  for  integration  by  parts, 


uDv  = u v 


vDu, 
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Infinite  calculus 
avoids  E hereby 
letting  1 — I 0, 


/ guess  ex  = 2x,for 
small  values  of  1 


Table  55  What’s 

the  difference? 

Ol 

W 

II 

<4-1 

Af  = g 

— K 

II 

M 

Af  = g 

X-  — 1 

0 

2" 

2" 

xl  = X 

Cx 

(c  l)cx 

x-  = x(x  — 1 ) 2 

X 

cx/  (c  — 1 ) 

cx 

X— 

mx2LA 

cf 

cAf 

xA±l/(m  + 1) 

X— 

f + 0 

Af  + Ag 

Hx 

X^i=  l/(x  + 1) 

fg 

fAg  + EgAf 

after  integration  and  rearranging  terms;  we  can  do  a similar  thing  in  finite 
calculus. 

We  start  by  applying  the  difference  operator  to  the  product  of  two  func- 
tions u(x)  and  v(x): 

A(u(x)  v(x))  = u(x+l]  v(x+l)  u(x)  v(x) 

= u(x+1 ) v(x+1 ) - u(x)  v(x+1 ) 

+ u(x)v(x+1)  -u(x)v(x) 

= u(x)  Av(x)  + v(x+l)  Au(x).  (2.54) 

This  formula  can  be  put  into  a convenient  form  using  the  shift  operator  E , 
defined  by 

Ef  ( x ) = f ( x +1  ). 

Substituting  this  for  v(x+1)  yields  a compact  rule  for  the  difference  of  a 
product: 

A(uv)  = uAv  + EvAu.  (2.55) 

(The  E is  a bit  of  a nuisance,  but  it  makes  the  equation  correct.)  Taking 
the  indefinite  sum  on  both  sides  of  this  equation,  and  rearranging  its  terms, 
yields  the  advertised  rule  for  summation  by  parts: 

^uAv  = uv-^EvAu.  (2-56) 

As  with  infinite  calculus,  limits  can  be  placed  on  all  three  terms,  making  the 
indefinite  sums  definite. 

This  rule  is  useful  when  the  sum  on  the  left  is  harder  to  evaluate  than  the 
one  on  the  right.  Let’s  look  at  an  example.  The  function  J xex  dx  is  typically 
integrated  by  parts;  its  discrete  analog  is  JL  x2x  6x,  which  we  encountered 
earlier  this  chapter  in  the  form  Jl£_o  k2k.  To  sum  this  by  parts,  we  let 
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u(x)  = x and  Av(x)  = 2X;  hence  Au(x)  = 1,  v(x)  — 2X,  and  Ev(x)  = 2X+1, 
Plugging  into  (2.56)  gives 

y x2”  6x  = x2”  „ 2X+1  6x  = x2x  2X+1  + C 

And  we  can  use  this  to  evaluate  the  sum  we  did  before,  by  attaching  limits: 


k=C 


x2x  — 2X+1 


n+1 

10 


((n+  1)2n+1  — 2n+2)  - (0-2° 


21 ) = (n-  l)2n+1  +2. 


It’s  easier  to  find  the  sum  this  way  than  to  use  the  perturbation  method, 
because  we  don’t  have  to  think. 

We  stumbled  across  a formula  for  ^0<k<n  Hk  earlier  in  this  chapter, 
and  counted  ourselves  lucky.  But  we  could  have  found  our  formula  (2.36) 
systematically,  if  we  had  known  about  summation  by  parts.  Let’s  demonstrate 
this  assertion  by  tackling  a sum  that  looks  even  harder,  ^0<k<n  kHk.  The 
solution  is  not  difficult  if  we  are  guided  by  analogy  with  J x In  x dx:  We  take 
u(x)  = Hx  and  Av(x)  = x ;=  x-,  hence  Au(x)  = x— , v(x)  = x-/2,  Ev(x)  = 
(x  + 1 )-/2,  and  we  have 


The  ultimate  goal 
of  mathematics 
is  to  eliminate  all 
need  for  intelligent 

thought. 


y xHx  6x 


= yHx  - 1 V x-5x 


(In  going  from  the  first  line  to  the  second,  we’ve  combined  two  falling  pow- 
ers (xT  1 )-x—  by  using  the  law  of  exponents  (2.52)  with  m = -1  and  n = 2.) 
Now  we  can  attach  limits  and  conclude  that 


Z kHk  = ZoxHx6x  = T(Hn-i).  (2-57) 

0$k<n  L v ' 

2.7  INFINITE  SUMS 

When  we  defined  t-notation  at  the  beginning  of  this  chapter,  we 
finessed  the  question  of  infinite  sums  by  saying,  in  essence,  “Wait  until  later. 
For  now,  we  can  assume  that  all  the  sums  we  meet  have  only  finitely  many 
nonzero  terms.”  But  the  time  of  reckoning  has  finally  arrived;  we  must  face 


This  is  finesse? 
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the  fact  that  sums  can  be  infinite.  And  the  truth  is  that  infinite  sums  are 
bearers  of  both  good  news  and  bad  news. 

First,  the  bad  news:  It  turns  out  that  the  methods  we’ve  used  for  manip- 
ulating £’S  are  not  always  valid  when  infinite  sums  are  involved.  But  next, 
the  good  news:  There  is  a large,  easily  understood  class  of  infinite  sums  for 
which  all  the  operations  we’ve  been  performing  are  perfectly  legitimate.  The 
reasons  underlying  both  these  news  items  will  be  clear  after  we  have  looked 
more  closely  at  the  underlying  meaning  of  summation. 

Everybody  knows  what  a finite  sum  is:  We  add  up  a bunch  of  terms,  one 
by  one,  until  they’ve  all  been  added.  But  an  infinite  sum  needs  to  be  defined 
more  carefully,  lest  we  get  into  paradoxical  situations. 

For  example,  it  seems  natural  to  define  things  so  that  the  infinite  sum 

S = 1 + 1 + - + - + — + — -t — 

is  equal  to  2,  because  if  we  double  it  we  get 

2S  - 2+l+l  + I + l + 1L  + ...  = 2 + S. 

On  the  other  hand,  this  same  reasoning  suggests  that  we  ought  to  define 

T = 1 + 2 + 4 + 8 + 16  + 32H — 


Sure:  1+2  + 

4 + 8 + . . is  the 
''infinite  precision” 
representation  of 
the  number  -1, 
in  a binary  com- 
puter with  infinite 
word  size. 


to  be  -1,  for  if  we  double  it  we  get 

2T  = 2 + 4 + 8 + 16  + 32  + 64 + •••  = T-l. 

Something  funny  is  going  on;  how  can  we  get  a negative  number  by  summing 
positive  quantities?  It  seems  better  to  leave  T undefined;  or  perhaps  we  should 
say  that  T =00,  since  the  terms  being  added  in  T become  larger  than  any 
fixed,  finite  number.  (Notice  that  oc  is  another  “solution”  to  the  equation 
2T  = T 1 ; it  also  “solves”  the  equation  2S  = 2 + S.) 

Let’s  try  to  formulate  a good  definition  for  the  value  of  a general  sum 
HkgK  a,c’  w^ere  K might  be  infinite.  For  starters,  let’s  assume  that  all  the 
terms  c+  are  nonnegative.  Then  a suitable  definition  is  not  hard  to  find:  If 
there’s  a bounding  constant  A such  that 


Y Qk  ^ A 

kfEF 


for  all  finite  subsets  F c K,  then  we  define  0+  to  be  the  least  such  A. 

(It  follows  from  well-known  properties  of  the  real  numbers  that  the  set  of 
all  such  A always  contains  a smallest  element.)  But  if  there’s  no  bounding 
constant  A,  we  say  that  cik  = oo;  this  means  that  if  A is  any  real 

number,  there’s  a set  of  finitely  many  terms  uk  whose  sum  exceeds  A. 
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The  definition  in  the  previous  paragraph  has  been  formulated  carefully 
so  that  it  doesn’t  depend  on  any  order  that  might  exist  in  the  index  set  K. 
Therefore  the  arguments  we  are  about  to  make  will  apply  to  multiple  sums 
with  many  indices  k; , k2,  • . , not  just  to  sums  over  the  set  of  integers. 

In  the  special  case  that  K is  the  set  of  nonnegative  integers,  our  definition 
for  nonnegative  terms  ak  implies  that 


Y_  Qk 


n 

lim  ) ak . 

n— too  — 

k=0 


Here’s  why:  Any  nondecreasing  sequence  of  real  numbers  has  a limit  (possi- 
bly oo ).  If  the  limit  is  A,  and  if  F is  any  finite  set  of  nonnegative  integers 
whose  elements  are  all  ^ n,  we  have  Hkep  ok  <C  ^k=0  Qk  ^ A;  hence  A = oo 
or  A is  a bounding  constant.  And  if  A’  is  any  number  less  than  the  stated 
limit  A,  then  there’s  an  n such  that  2Ik=0  ak  > A’;  hence  the  finite  set 
F = {0, 1 , , . . ,n}  witnesses  to  the  fact  that  A’  is  not  a bounding  constant. 

We  can  now  easily  compute  the  value  of  certain  infinite  sums,  according 
to  the  definition  just  given.  For  example,  if  ak  = xk,  we  have 


The  set  K might 
even  be  uncount- 
able. But  only  a 
countable  num- 
ber of  terms  can 
be  nonzero,  if  a 

bounding  constant 
A exists,  because  at 
most  nA  terms  are 
$ 1/n. 


L 


k^O 


1-Xn+1 

lim 

n—*oo  1 — x 


f 1/(1  -x),  ifO<cx<  1; 
\ oo,  if  x ^ 1 . 


In  particular,  the  infinite  sums  S and  T considered  a minute  ago  have  the  re- 
spective values  2 and  oo,just  as  we  suspected.  Another  interesting  example  is 


L 


1 

(k  + 1 )(k  + 2) 


k^O 


n 

lim  V k— 
twoq>  L — 

k=0 


lim 

TT — * OO 


I 


Now  let’s  consider  the  case  that  the  sum  might  have  negative  terms  as 
well  as  nonnegative  ones.  What,  for  example,  should  be  the  value  of 


Jj-1)k  = 1-1+1-1+1-1+...? 

k^O 

If  we  group  the  terms  in  pairs,  we  get 

(1  — 1 ) + ( 1 — 1)  + (1  — 1 ) + • • • = 0 + 0 + 0 + ■ • • i 

so  the  sum  comes  out  zero;  but  if  we  start  the  pairing  one  step  later,  we  get 

1 - (1  - 1)  - (1  - 1)  - (1  - 1) — ‘-O-O-O-...; 

the  sum  is  1 . 


“Aggregation 
quantitatum 
a-a+a-a  + a--a 
etc.  nunc  est  = a, 
nunc  = 0,  adeoque 
continuata  in  infini- 
tum serie  ponendus 
= a/2,  fateor 
acumen  et  veritatem 
animadversionis 

— G.  Crandi  [133] 
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Is  this  the  first  page 
with  no  graffiti? 


We  might  also  try  setting  x = -1  in  the  formula  ^k>0  xk  = 1 /(I  ~ x), 
since  we’ve  proved  that  this  formula  holds  when  0 ^ x < 1;  but  then  we  are 
forced  to  conclude  that  the  infinite  sum  is  1,  although  it’s  a sum  of  integers! 

Another  interesting  example  is  the  doubly  infinite  ak  where  ak  = 
1/(k+  1)  for  k ?>  0 and  ak  = l/(k  — 1)  for  k < 0.  We  can  write  this  as 

• • • + ( — j ) + ( — 2)  + 1 + 2 + 3 + ? + ' ' ' • (2-58) 


If  we  evaluate  this  sum  by  starting  at  the  “center”  element  and  working 
outward, 

"+  l ' (~5  +(“2  +(D+  2)+  3)  +?)+•••  • 

we  get  the  value  1;  and  we  obtain  the  same  value  1 if  we  shift  all  the  paren- 
theses one  step  to  the  left, 

+ (-5  + H + (- T + (-2)  + 1)  + 2)  + l)+---  ■ 

because  the  sum  of  all  numbers  inside  the  innermost  n parentheses  is 

11  II  1 1 ,11 

: — - + 1 + - + •••  H — — 1 —7  • 

n+1  n 2 2 m— 11  n n + 1 

A similar  argument  shows  that  the  value  is  1 if  these  parentheses  are  shifted 
any  fixed  amount  to  the  left  or  right;  this  encourages  us  to  believe  that  the 
sum  is  indeed  1 , On  the  other  hand,  if  we  group  terms  in  the  following  way, 


" + (“I  + (“I  + (“2  + 1 + l)  + T + 1)  + 5 + l)  + ’ 

the  nth  pair  of  parentheses  from  inside  out  contains  the  numbers 


_ 1 
n + l 


n 1 1 ii 

2 2 2n— 1 2n 


H2n-  H 


n+l 


We’ll  prove  in  Chapter  9 that  limn_,0O(H2n— Hn+i  ) = ln2;  hence  this  grouping 
suggests  that  the  doubly  infinite  sum  should  really  be  equal  to  1 + ln2. 

There’s  something  flaky  about  a sum  that  gives  different  values  when 
its  terms  are  added  up  in  different  ways.  Advanced  texts  on  analysis  have 
a variety  of  definitions  by  which  meaningful  values  can  be  assigned  to  such 
pathological  sums;  but  if  we  adopt  those  definitions,  we  cannot  operate  with 
x-notation  as  freely  as  we  have  been  doing.  We  don’t  need  the  delicate  refine- 
ments of  “conditional  convergence”  for  the  purposes  of  this  book;  therefore 
we’ll  stick  to  a definition  of  infinite  sums  that  preserves  the  validity  of  all  the 
operations  we’ve  been  doing  in  this  chapter. 
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In  fact,  our  definition  of  infinite  sums  is  quite  simple.  Let  K be  any 
set,  and  let  crk  be  a real-valued  term  defined  for  each  k £ K.  (Here  "k' 
might  actually  stand  for  several  indices  ]q  , ]<2,  ■ • , and  K might  therefore  be 
multidimensional.)  Any  real  number  x can  be  written  as  the  difference  of  its 
positive  and  negative  parts, 

X = x+  — x where  x+  = X‘  [x  > 0]  and  x = — x-[x<0]. 

(Either  x+  = Oorx  =0.)  We’ve  already  explained  how  to  define  values  for 

the  infinite  sums  ^k(EK  Qk  and  HkSK  a”, because  ak  and  ak  are  nonnegative. 

Therefore  our  general  definition  is 

]Tak  = a+  “ (2-59) 

k<EK  keK  k£K 

unless  the  right-hand  sums  are  both  equal  to  oo.  In  the  latter  case,  we  leave 
Qk  undefined. 

Let  A+  = JLkeK  Qk  anb  A'  = HkgK  ak  ■ b A+  anb  A-  are  both  finite, 

the  sum  J2keK  ok  is  said  to  converge  absolutely  to  the  value  A = A+  — A A 

If  A+  00  but  A is  finite,  the  sum  Qk  is  said  to  diverge  to  -foo. 

Similarly,  if  A-  = oo  but  A+  is  finite,  ^k6K  Qk  is  said  to  diverge  to  — oo.  If 
A+  = A = oo,  all  bets  are  off. 

We  started  with  a definition  that  worked  for  nonnegative  terms,  then  we 
extended  it  to  real-valued  terms.  If  the  terms  ak  are  complex  numbers,  we 
can  extend  the  definition  once  again,  in  the  obvious  way:  The  sum  2ZkgK  Qk 
is  defined  to  be  ^kgK  ^ak  + i^kgK  ^Qk>  where  SHak  and  3ak  are  the  real 
and  imaginary  parts  of  ok— provided  that  both  of  those  sums  are  defined. 
Otherwise  X.kek  Qk  is  undefined.  (See  exercise  18.) 

The  bad  news,  as  stated  earlier,  is  that  some  infinite  sums  must  be  left 
undefined,  because  the  manipulations  we’ve  been  doing  can  produce  inconsis- 
tencies in  all  such  cases.  (See  exercise  34.)  The  good  news  is  that  all  of  the 
manipulations  of  this  chapter  are  perfectly  valid  whenever  we’re  dealing  with 
sums  that  converge  absolutely,  as  just  defined. 

We  can  verify  the  good  news  by  showing  that  each  of  our  transformation 
rules  preserves  the  value  of  all  absolutely  convergent  sums.  This  means,  more 
explicitly,  that  we  must  prove  the  distributive,  associative,  and  commutative 
laws,  plus  the  rule  for  summing  first  on  one  index  variable;  everything  else 
we’ve  done  has  been  derived  from  those  four  basic  operations  on  sums. 

The  distributive  law  (2.15)  can  be  formulated  more  precisely  as  follows: 
If  ^kgK  Qk  converges  absolutely  to  A and  if  c is  any  complex  number,  then 
cak  converges  absolutely  to  cA.  We  can  prove  this  by  breaking  the  sum 
into  real  and  imaginary,  positive  and  negative  parts  as  above,  and  by  proving 
the  special  case  in  which  c > 0 and  each  term  Qk  is  nonnegative.  The  proof 


In  other  words,  ab- 
solute convergence 
means  that  the  sum 
of  absolute  values 
converges, 
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Best  to  skim  this 
page  the  first  time 
you  get  here 

Your  friendly  TA 

then  there  exist  complex  numbers  Aj  for  each  j £ J such  that 
Qjk  converges  absolutely  to  Aj,  and 

k£Kj 

y Aj  converges  absolutely  to  A. 
jej 

It  suffices  to  prove  this  assertion  when  all  terms  are  nonnegative,  because  we 
can  prove  the  general  case  by  breaking  everything  into  real  and  imaginary, 
positive  and  negative  parts  as  before.  Let’s  assume  therefore  that  dj^  ^ 0 for 
all  pairs  (j,  k)  € M,  where  M is  the  master  index  set  {(j,  k)  j 6 J,  k £ Kj }. 
We  are  given  that  ]T  ^ k|€M  ajik  is  finite,  namely  that 

y_  a;,k  ^ a 

(j.J0€F 

for  all  finite  subsets  FCM,  and  that  A is  the  least  such  upper  bound.  If  j is 
any  element  of  J,  each  sum  of  the  form  ^k£F  aj,k  where  Fj  is  a finite  subset 
of  Kj  is  bounded  above  by  A.  Hence  these  finite  sums  have  a least  upper 
bound  Aj  7>  0,  and  ^keK,  Qj,k  = Aj  by  definition. 

We  still  need  to  prove  that  A is  the  least  upper  bound  of  £LgG  Aj, 
for  all  finite  subsets  G C J.  Suppose  that  G is  a finite  subset  of  J with 
HjgG  Aj  = A’  > A.  We  can  find  finite  subsets  Fj  C Kj  such  that  Hk6f.  ai.k  > 
(A/A')Aj  for  each  j £ G with  Aj  > 0.  There  is  at  least  one  such  j.  But  then 
£LgG  keF.  Qj,k  > (A/A’)  XljeG  Aj  = A,  contradicting  the  fact  that  we  have 


in  this  special  case  works  because  ^keF  Cflk  = c ^keF  ctk  for  all  finite  sets  F; 
the  latter  fact  follows  by  induction  on  the  size  of  F. 

The  associative  law  (2.16)  can  be  stated  as  follows:  If  {TkGK  dk  and 
£keK  bk  converge  absolutely  to  A and  B,  respectively,  then  ^IkeK ( Qk  + bk) 
converges  absolutely  to  A + B.  This  turns  out  to  be  a special  case  of  a more 
general  theorem  that  we  will  prove  shortly. 

The  commutative  law  (2.17)  doesn’t  really  need  to  be  proved,  because 
we  have  shown  in  the  discussion  following  (2.35)  how  to  derive  it  as  a special 
case  of  a general  rule  for  interchanging  the  order  of  summation. 

The  main  result  we  need  to  prove  is  the  fundamental  principle  of  multiple 
sums:  Absolutely  convergent  sums  over  two  or  more  indices  can  always  be 
summed  first  with  respect  to  any  one  of  those  indices.  Formally,  we  shall 
prove  that  if  J and  the  elements  of  {Kj  j £ J}  are  any  sets  of  indices  such  that 

y Qj  k converges  absolutely  to  A, 


62  SUMS 


]T(j  kjeF  <C  A for  all  finite  subsets  F C M.  Hence  ^gG  Aj  <C  A,  for  all 
finite  subsets  G C J, 

Finally,  let  A’  be  any  real  number  less  than  A.  Our  proof  will  be  complete 
if  we  can  find  a finite  set  G C J such  that  £VgG  Aj  > A’.  We  know  that 
there’s  a finite  set  F CM  such  that  kjgF  Qj  ^ > A’;  let  G be  the  set  of  j’s 

in  this  F,  and  let  Fj  = {k  (j,  kj  € F}.  Then  £\gG  Aj  ^ £jgG  £kGF.  Qjik  = 

Uli.kjef  a).k  > A';  QED. 

OK,  we’re  now  legitimate!  Everything  we’ve  been  doing  with  infinite 

sums  is  justified,  as  long  as  there’s  a finite  bound  on  all  finite  sums  of  the 

absolute  values  of  the  terms.  Since  the  doubly  infinite  sum  (2.58)  gave  us 
two  different  answers  when  we  evaluated  it  in  two  different  ways,  its  positive 
terms  1 + 1 + F + • . . must  diverge  to  oo;  otherwise  we  would  have  gotten  the 
same  answer  no  matter  how  we  grouped  the  terms. 


So  wiiy  have ! been 
hearing  a lot  lately 
about  ‘'harmonic 
convergence”? 


Exercises 

Warmups 

1 What  does  the  notation 

o 

L q* 

k=4 

mean? 

2 Simplify  the  expression  x . ([x  > 01  — [x  < 0]). 

3 Demonstrate  your  understanding  of  t-notation  by  writing  out  the  sums 

Z_  a k and  Y.  aki 

0$ksC5 

in  full.  (Watch  out  -the  second  sum  is  a bit  tricky.) 

4 Express  the  triple  sum 

Y_  aijk 

1 $i<j<k$4 

as  a three-fold  summation  (with  three  ^’s), 
a summing  first  on  k,  then  j,  then  i; 

b summing  first  on  i,  then  j,  then  k. 

Also  write  your  triple  sums  out  in  full  without  the  t-notation,  using 
parentheses  to  show  what  is  being  added  together  first. 
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Yield  to  the  rising 
power. 


5 


What’s  wrong  with  the  following  derivation? 


il  \ / ri  1 \ il  u 

i = 1 ' Nk=1  ' ) = 1 k=  I 


n n 


LL 


Ok 

Qk 


= n2 

k=l 


6 What  is  the  value  of  ^k[l  <C  n],  as  a function  of  j and  n? 

7 Let  Vf(x)  = f(x)  — f ( x - 1) . What  is  V(xm)? 

8 What  is  the  value  of  0—,  when  nr  is  a given  integer? 

9 What  is  the  law  of  exponents  for  rising  factorial  powers,  analogous  to 
(2.52)?  Use  this  to  define  x~n. 

1 0 The  text  derives  the  following  formula  for  the  difference  of  a product: 


A(uv)  = uAv  + EvAu. 


How  can  this  formula  be  correct,  when  the  left-hand  side  is  symmetric 
with  respect  to  u.  and  v but  the  right-hand  side  is  not? 


Basics 

1 1  The  general  rule  (2.56)  for  summation  by  parts  is  equivalent  to 

(Q-k+1  — Qk)Ek  = fl-nbn  <1(5^0 

0^k<n 

- Y.  ak+i  (frk+i  bk),  forn^O. 

0$k<n 


Prove  this  formula  directly  by  using  the  distributive,  associative,  and 
commutative  laws. 

12  Show  that  the  function  p(k)  = k+  (— l)kc  is  a permutation  of  the  set  of 
all  integers,  whenever  c is  an  integer. 

13  Use  the  repertoire  method  to  find  a closed  form  for  !Lo(-Dkk2- 

14  Evaluate  ^k=1  k,2k  by  rewriting  it  as  the  multiple  sum  Xli<j<k<n  2 k • 

15  Evaluate  ®n  = k3  by  the  text’s  Method  5 as  follows:  First  write 

®n  0 □ ■ B 2£1^$k^Jk;thenapply(2.33). 

16  Prove  that  x— /(x  — n)— = x— / (x  — m)— , unless  one  of  the  denominators 
is  zero. 

17  Show  that  the  following  formulas  can  be  used  to  convert  between  rising 
and  falling  factorial  powers,  for  all  integers  nr: 

xffi  = (-i)m(-xp  = (x  + m — 1 )—  = l/(x  — 1 )— ; 
xm  _ (-1)m(-xr  = (x-m+ir  = 1/(x  + 1)“ 


(The  answer  to  exercise  9 defines 
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18  Let  1Hz  and  3z  be  the  real  and  imaginary  parts  of  the  complex  num- 

ber z.  The  absolute  value  |z|  is  ^(IHz)2  + [3z]2.  A sum  ]T  Q|<  of  com- 
plex terms  Qk  is  said  to  converge  absolutely  when  the  real- valued  sums 
XlkeK  and  Jdk  both  converge  absolutely.  Prove  that  ^k€K  ak 
converges  absolutely  if  and  only  if  there  is  a bounding  constant  B such 
that  |dkl  $ B for  all  finite  subsets  FcK 

Homework  exercises 

19  Use  a summation  factor  to  solve  the  recurrence 

To  = 5; 

2Tn  = nTn_i  + 3 . n!  , for  n > 0. 

20  Try  to  evaluate  kHk  by  the  perturbation  method,  but  deduce  the 

value  of  22k=0  Hk  instead. 

21  Evaluate  the  sums  Sn  = ^£=0(— 1 )n“k,  Tn  = JTk=0(-1  )n_kk,  and  Un  = 
]Tk=0(— 1 )n_kk2  by  the  perturbation  method,  assuming  that  n ^ 0. 

2 2 Prove  Lagrange's  identity  (without  using  induction): 

(djbk  — akbj)2  = (X.ik)(XLb^  - ^akbk 

l$j<k$n  'v,'k=l  ' '’'k=1  ' 'k=1 

This,  incidentally,  implies  Cauchy’s  inequality, 

(L  akb02  6 (£  Q0  (L  b0 

k=l  k=l  k=1 

23  Evaluate  the  sum  (2k  -(-  1 )/(k(k  + 1))  in  two  ways: 
a Replace  1 /k(k  + 1)  by  the  “partial  fractions”  1 /k  1 /(k  + 1). 
b Sum  by  parts. 

24  What  is  Xlo<k<n  Tik/ (k  + l)(k  + 2)?  Hint:  Generalize  the  derivation  of 
(2-57)- 

2 5 The  notation  I~Ik6K  Uk  means  the  product  of  the  numbers  ak  for  all  k 6 K. 
Assume  for  simplicity  that  ak  ^ 1 for  only  finitely  many  k;  hence  infinite 
products  need  not  be  defined.  What  laws  does  this  n-notation  satisfy, 
analogous  to  the  distributive,  associative,  and  commutative  laws  that 
hold  for  YJ- 

26  Express  the  double  product  I"Il<j<k<n  °j  ak  in  terms  of  the  single  product 
FIk=i  Qk  by  manipulating  n-notation.  (This  exercise  gives  us  a product 
analog  of  the  upper-triangle  identity  (2.33).) 


It's  hard  to  prove 
the  identity  of 

somebody  who’s 
been  dead  for  1 75 
years. 


This  notation  was 
introduced  by 
Jacobi  in  1829  [162], 
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27  Compute  A(c-),  and  use  it  to  deduce  the  value  of  ( — 2)— /k.. 

28  At  what  point  does  the  following  derivation  go  astray? 


1 


L 


1 

k(k  + 1) 


iii-k-n) 

Eli  = k-|]) 

bk=>+']) 

y — — 


Exam,  problems 

29  Evaluate  the  sum  (— l)kk/(4k2  — 1). 

30  Cribbage  players  have  long  been  aware  that  15  = 7 + 8=  4 + 5+  6 = 
1+2+3  + 4 + 5.  Find  the  number  of  ways  to  represent  1050  as  a sum  of 
consecutive  positive  integers.  (The  trivial  representation  ‘1050’  by  itself 
counts  as  one  way;  thus  there  are  four,  not  three,  ways  to  represent  15 
as  a sum  of  consecutive  positive  integers.  Incidentally,  a knowledge  of 
cribbage  rules  is  of  no  use  in  this  problem.) 

31  Riemann’s  zeta  function  £(k)  is  defined  to  be  the  infinite  sum 

, 11  v-  1 

1 + 2k  + 3k  + '"  " Ljk 

Prove  that  ^.^>2 — 1)  = 1.  What  is  the  value  of  (C(2k)  — 1)? 

3 2 Let  a — b = max(0,  a — b).  Prove  that 


y min(k,  x — k)  = ^ (x  — (2k  + 1 ) ) 

k^O  k;:0 

for  all  real  x 0,  and  evaluate  the  sums  in  closed  form. 

Bonus  problems 

33  Let  AkeKQk  denote  the  minimum  of  the  numbers  (or  their  greatest 
lower  bound,  if  K is  infinite),  assuming  that  each  is  either  real  or  ±oo. 
What  laws  are  valid  for  A-notation,  analogous  to  those  that  work  for  ^ 
and  PJ?  (See  exercise  25.) 


The  laws  of  the 
jungle. 
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34  Prove  that  if  the  sum  ^"keK  ctfc  is  undefined  according  to  (2.59),  then  it 
is  extremely  flaky  in  the  following  sense:  If  A-  and  A+  are  any  given 
real  numbers,  it’s  possible  to  find  a sequence  of  finite  subsets  Fi  c F2  c 
F3  C > . . of  K such  that 

Qk  6 a - , when  n is  odd;  L Qk  ^ A+,  when  n is  even. 

k€Fn  k€F„ 

35  Prove  Goldbach’s  theorem 


where  P is  the  set  of  “perfect  powers”  defined  recursively  as  follows:  Perfect  power 

corrupts  perfectly. 

P = {mn  m ^ 2,n  ^ 2,m  ^ P}. 


11111111 

1 = - 4-  - 4 p 1 1 1 1 L, 

3 7 8 15  24  26  31  35 


Solomon  Golomb’s  “self.-describing  sequence”  (f  (1) , f (2),f  (3), . . . ) is  the 
only  nondecreasing  sequence  of  positive  integers  with  the  property  that 
it  contains  exactly  f(k)  occurrences  of  k for  each  k.  A few  moments’ 
thought  reveals  that  the  sequence  must  begin  as  follows: 

5 6 7 8 9 10  11  12 

344455  5 6 


Let  g(n)  be  the  largest  integer  m such  that  f(m)  = n.  Show  that 
a 9(n)  = £k=1  f(k). 
b g(g(n))  = Lk=i  kf(k). 

C g(g(g(n)))  = ^ng(n)(g(n)  + 1)  \ g(k)(g(k)  + 1). 

Research  problem 

37  Will  all  the  Pk  by  l/(k  + 1)  rectangles,  for  k 1 , fit  together  inside  a 
1 by  1 square?  (Recall  that  their  areas  sum  to  1.) 


1 1 

2 2 


3 


)Ouch.( 


Integer  Functions 


WHOLE  NUMBERS  constitute  the  backbone  of  discrete  mathematics,  and  we 
often  need  to  convert  from  fractions  or  arbitrary  real  numbers  to  integers.  Our 
goal  in  this  chapter  is  to  gain  familiarity  and  fluency  with  such  conversions 
and  to  learn  some  of  their  remarkable  properties. 

3.1  FLOORS  AND  CEILINGS 

We  start  by  covering  the  floor  (greatest  integer)  and  ceiling  (least 
integer)  functions,  which  are  defined  for  all  real  x as  follows: 

[xj  = the  greatest  integer  less  than  or  equal  to  x; 

(3-1) 

[x]  = the  least  integer  greater  than  or  equal  to  x . 

Kenneth  E.  Iverson  introduced  this  notation,  as  well  as  the  names  “floor’'  and 
“ceiling,"  early  in  the  1960s  [161,  page  12].  He  found  that  typesetters  could 
handle  the  symbols  by  shaving  the  tops  and  bottoms  off  of  1 [’  and  ‘]  ’•  His 
notation  has  become  sufficiently  popular  that  floor  and  ceiling  brackets  can 
now  be  used  in  a technical  paper  without  an  explanation  of  what  they  mean. 
Until  recently,  people  had  most  often  been  writing  ‘[x]’  for  the  greatest  integer 
^ x,  without  a good  equivalent  for  the  least  integer  function.  Some  authors 
had  even  tried  to  use  ‘]x[’ — with  a predictable  lack  of  success. 

Besides  variations  in  notation,  there  are  variations  in  the  functions  them- 
selves. For  example,  some  pocket  calculators  have  an  INT  function,  defined 
as  |_xj  when  x is  positive  and  [x]  when  x is  negative.  The  designers  of 
these  calculators  probably  wanted  their  INT  function  to  satisfy  the  iden- 
tity INT(-x)  = — INT(x).  But  we’ll  stick  to  our  floor  and  ceiling  functions, 
because  they  have  even  nicer  properties  than  this. 

One  good  way  to  become  familiar  with  the  floor  and  ceiling  functions 
is  to  understand  their  graphs,  which  form  staircase-like  patterns  above  and 
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below  the  line  f(x)  = x: 


We  see  from  the  graph  that.,  for  example, 

L«J  = 2 , [-ej  =■  3, 
fel  = 3,  [-el  = -2, 

since  e = 2.71828..  . . 

By  staring  at  this  illustration  we  can  observe  several  facts  about  floors 
and  ceilings.  First,  since  the  floor  function  lies  on  or  below  the  diagonal  line 
f(x)  = x,  we  have  |_xj  <i  x;  similarly  [xl  ^ x.  (This,  of  course,  is  quite 
obvious  from  the  definition.)  The  two  functions  are  equal  precisely  at  the 
integer  points: 

|_xj  — x x is  an  integer  4=^  [x]  = x. 

(We  use  the  notation  '<(=)>’  to  mean  “if  and  only  if!‘)  Furthermore,  when 
they  differ  the  ceiling  is  exactly  1 higher  than  the  floor: 

Cute. 

By  Iverson  ’S  bracket 
convention,  this  is  a 
complete  equation. 


[x]  — |xj  = [x  is  not  an  integer]  . (3.2) 

If  we  shift  the  diagonal  line  down  one  unit,  it  lies  completely  below  the  floor 
function,  so  x — 1 < [xj;  similarly  x + 1 > [x].  Combining  these  observations 
gives  us 


x ■ I < |xJ  ^ x ^ M < x + 1 . 


(3-3) 


Finally,  the  functions  are  reflections  of  each  other  about  both  axes: 


L-xJ  = -M  ; [-x]  = — LxJ 


(34) 
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Next  week  we’re 
getting  walls. 


Thus  each  is  easily  expressible  in  terms  of  the  other.  This  fact  helps  to 
explain  why  the  ceiling  function  once  had  no  notation  of  its  own.  But  we 
see  ceilings  often  enough  to  warrant  giving  them  special  symbols,  just  as  we 
have  adopted  special  notations  for  rising  powers  as  well  as  falling  powers. 
Mathematicians  have  long  had  both  sine  and  cosine,  tangent  and  cotangent, 
secant  and  cosecant,  max  and  min;  now  we  also  have  both  floor  and  ceiling. 

To  actually  prove  properties  about  the  floor  and  ceiling  functions,  rather 
than  just  to  observe  such  facts  graphically,  the  following  four  rules  are  espe- 
cially useful: 

|_xj  = n n ^ x < n + 1 , ( a ) 

[xj  = rt  x - 1 < n ^ x , (b) 

|Y|  = n 4=^  n - 1 < x ^ n , ( c ) 

[x]  - n 4=4  x ^ n < x + 1 . (d) 

(We  assume  in  all  four  cases  that  n is  an  integer  and  that  x is  real.)  Rules 
(a)  and  (c)  are  immediate  consequences  of  definition  (3.1);  rules  (b)  and  (d) 
are  the  same  but  with  the  inequalities  rearranged  so  that  n is  in  the  middle. 
It’s  possible  to  move  an  integer  term  in  or  out  of  a floor  (or  ceiling): 

|_x  T nj  = [xj  + n,  integer  n.  (3.6) 


(Because  rule  (3.5(a))  says  that  this  assertion  is  equivalent  to  the  inequalities 
[xj  + n <C  x + n < [xj  + n + I . ) But  similar  operations,  like  moving  out  a 
constant  factor,  cannot  be  done  in  general.  For  example,  we  have  [nxj  / n[xj 
when  n = 2 and  x = 1/2.  This  means  that  floor  and  ceiling  brackets  are 
comparatively  inflexible.  We  are  usually  happy  if  we  can  get  rid  of  them  or  if 
we  can  prove  anything  at  all  when  they  are  present. 

It  turns  out  that  there  are  many  situations  in  which  floor  and  ceiling 
brackets  are  redundant,  so  that  we  can  insert  or  delete  them  at  will.  For 
example,  any  inequality  between  a real  and  an  integer  is  equivalent  to  a floor 
or  ceiling  inequality  between  integers: 


x < n 

W < n , 

(a) 

n < x 

4=4 

n < |Y| , 

(b) 

x ^ rt 

4=4> 

M 6 n, 

(c) 

n ^ x 

4=4 

rt  6 LXJ  • 

(d) 

These  rules  are  easily  proved.  For  example,  if  x < n then  surely  [xj  < n,  since 
|_xj  <C  x.  Conversely,  if  |_xj  < n then  we  must  have  x < n,  since  x < |_XJ  + 1 
and  LxJ  + 1 <C  n. 

It  would  be  nice  if  the  four  rules  in  (3.7)  were  as  easy  to  remember  as 
they  are  to  prove.  Each  inequality  without  floor  or  ceiling  corresponds  to  the 
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same  inequality  with  floor  or  with  ceiling;  but  we  need  to  think  twice  before 
deciding  which  of  the  two  is  appropriate. 

The  difference  between,  x and  fxj  is  called  the  fractional  part  of  x,  and 
it  arises  often  enough  in  applications  to  deserve  its  own  notation: 

W = X - W . (3.8) 

We  sometimes  call  [xj  the  integer  part  of  x,  since  x = [xj  + {x}.  If  a real 
number  x can  be  written  in  the  form  x = n + 0,  where  rt  is  an  integer  and 
0 ^ 0 < 1 , we  can  conclude  by  (3.5(a))  that  n = [xj  and  0 = {x}. 

Identity  (3.6)  doesn’t  hold  if  n is  an  arbitrary  real.  But  we  can  deduce 
that  there  are  only  two  possibilities  for  [x  + yj  in  general:  If  we  write  x = 
|xj  + {x}  and  y = [yj  + {y},  then  we  have  |x  + yj  = |_xj  + |_yj  + LM  + {y}J. 
And  since  0 <C  {x}  -|-  { y } < 2,  we  find  that  sometimes  [x  -f  yj  is  [xj  + [yj, 
otherwise  it’s  [xj  + [yj  + 1. 

3.2  FLOOR/CEILING  APPLICATIONS 

We’ve  now  seen  the  basic  tools  for  handling  floors  and  ceilings.  Let’s 
put  them  to  use,  starting  with  an  easy  problem:  What’s  fig  35]?  (We  use  Tg’ 
to  denote  the  base-2  logarithm.)  Well,  since  2s  < 35  <C  26,  we  can  take  logs 
to  get  5 < lg35  <[  6;  so  (3.5(c))  tells  us  that  fig 35]  = 6. 

Note  that  the  number  35  is  six  bits  long  when  written  in  radix  2 notation: 
35  = (100011)-.  Is  it  always  true  that  flgn]  is  the  length  of  n written  in 
binary?  Not  quite.  We  also  need  six  bits  to  write  32  = (100000)2.  So  flgn] 
is  the  wrong  answer  to  the  problem.  (It  fails  only  when  n is  a power  of  2, 
but  that’s  infinitely  many  failures.)  We  can  find  a correct  answer  by  realizing 
that  it  takes  m bits  to  write  each  number  n such  that  2m~~1  <]  n < 2m;  thus 
(3.5(a))  tells  us  that  m — 1 = [lgnj,  so  m = flgn]  + 1.  That  is,  we  need 
flgn]  t 1 bits  to  express  n in  binary,  for  all  n > 0.  Alternatively,  a similar 
derivation  yields  the  answer  flg(n  t 1 )];  this  formula  holds  for  n = 0 as  well, 
if  we’re  willing  to  say  that  it  takes  zero  bits  to  write  n = 0 in  binary. 

Let’s  look  next  at  expressions  with  several  floors  or  ceilings.  What  is 
(Lx]]?  E a s y Since  |_xj  is  an  integer,  ["[xj]  is  just  LXJ-  So  is  any  other  ex- 
pression with  an  innermost  fxj  surrounded  by  any  number  of  floors  or  ceilings. 

Here’s  a tougher  problem:  Prove  or  disprove  the  assertion 

LVWl  = Lx/xJ  , real  X ^ 0.  (3.9) 

Equality  obviously  holds  when  x is  an  integer,  because  x = fxj.  And  there’s 
equality  in  the  special  cases  n = 3.14159.  . . , e = 2.71828.  . . , and  <|)  = 

(1  + x/5 )/2  = 1.61803...,  because  we  get  1 = 1.  Our  failure  to  find  a coun- 
terexample suggests  that  equality  holds  in  general,  so  let’s  try  to  prove  it. 


Hmmm,  We'd  bet- 
ter not  write  {x} 
for  the  fractional 
part  when  it  could 
be  confused  with 
the  set  containing  x 
as  its  only  element, 


The  second  case 
occurs  if  and  only 
if  there's  a "carry" 
at  the  position  of 
the  decimal  point, 
when  the  fractional 
parts  {x}  and  {y} 
are  added  together. 


[Of  course  71,  e, 
and  43  are  the 
obvious  first  real 
numbers  to  try, 
aren't  the/?) 
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Skepticism  is 
healthy  only  to 
a limited  extent, 
Being  skeptical 
about  proofs  and 
programs  (particu- 
larly your  own)  will 
probably  keep  your 
grades  healthy  and 
your  job  fairly  se- 
cure, But  applying 
that  much  skepti- 
cism will  probably 
also  keep  you  shut 
away  working  all 
the  time,  instead 
of  letting  you  get 
out  for  exercise  and 
relaxation. 

Too  much  skepti- 
cism is  an  open  in- 
vitation to  the  state 
of  rigor  mortis, 
where  you  become 
so  worried  about 
being  correct  and 
rigorous  that  you 
never  get  anything 
finished. 

-A  skeptic 


(This  observation 
was  made  by  R.  J. 
McEliece  when  he 
was  an  undergrad.) 


Incidentally,  when  we’re  faced  with  a “prove  or  disprove,”  we’re  usually 
better  off  trying  first  to  disprove  with  a counterexample,  for  two  reasons: 
A disproof  is  potentially  easier  (we  need  just  one  counterexample);  and  nit- 
picking arouses  our  creative  juices.  Even  if  the  given  assertion  is  true,  our 
search  for  a counterexample  often  leads  us  to  a proof,  as  soon  as  we  see  why 
a counterexample  is  impossible.  Besides,  it’s  healthy  to  be  skeptical. 

If  we  try  to  prove  that  |_\/ Wj  = Lx/*J  with  the  help  of  calculus,  we  might 
start  by  decomposing  x into  its  integer  and  fractional  parts  [xj  + {x}  = n + 0 
and  then  expanding  the  square  root  using  the  binomial  theorem:  (n  + 0)1'/2  = 
n1/7 -|-n"  1,/20/2  — n~'}/202/8  + • • ■ ■ But  this  approach  gets  pretty  messy. 

It’s  much  easier  to  use  the  tools  we’ve  developed.  Here’s  a possible  strat- 
egy: Somehow  strip  off  the  outer  floor  and  square  root  of  |_xj  J , then  re- 

move the  inner  floor,  then  add  back  the  outer  stuff  to  get  |_x/x_|-  OK.  We  let 
m = [\/ |xj J and  invoke  (3.5(a)),  giving  m ^ \/[xJ  < m + 1-  That  removes 
the  outer  floor  bracket  without  losing  any  information.  Squaring,  since  all 
three  expressions  are  nonnegative,  we  have  m2  5$  ]_xj  < (m  -)-  1 )2.  That  gets 
rid  of  the  square  root.  Next  we  remove  the  floor,  using  (3.7(d))  for  the  left 
inequality  and  (3.7(a))  for  the  right:  m2  ^ x < (m  + l)2.  It’s  now  a simple 
matter  to  retrace  our  steps,  taking  square  roots  to  get  m <C  y/ x < m + 1 and 
invoking  (3-5(a))  to  get  m = Lx/xJ.  Thus  [v/jxJJ  = m = Lx/xji  the  assertion 
is  true.  Similarly,  we  can  prove  that 

Fx/Ml  = [\/x]  , real  x ^ 0. 

The  proof  we  just  found  doesn’t  rely  heavily  on  the  properties  of  square 
roots.  A closer  look  shows  that  we  can  generalize  the  ideas  and  prove  much 
more:  Let  f ( x ) be  any  continuous,  monotonically  increasing  function  with  the 
property  that 

f ( x ) = integer  =>  x = integer. 

(The  symbol  '=)>'  means  “implies!')  Then  we  have 

i_f (x)j  = i_f(w  )j  and  rfwi  = rf(M)i,  (3-10) 

whenever  f ( x ) , f(|_X_|),  and  f ( |"x~|)  are  defined.  Let’s  prove  this  general  prop- 
erty for  ceilings,  since  we  did  floors  earlier  and  since  the  proof  for  floors  is 
almost  the  same.  If  x = pc],  there’s  nothing  to  prove.  Otherwise  x < pc], 
andf(x)  <f  ( pc])  since  f is  increasing.  Hence  pf  (x)]  <1  pf  ( px]  )].,  since[]is 
nondecreasing.  If  ff  (x)”|  < [f(  [x  | )~| , there  must  be  a number  y such  that 
X ^ y<  |"x]  and  f(y)  = pf(x)],  since  f is  continuous.  This  y is  an  integer,  be- 
cause of  f 1 s special  property.  But  there  cannot  be  an  integer  strictly  between 
x and  px].  This  contradiction  implies  that  we  must  have  pf  (x)]  = |~f  ( [ X | )]. 
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An  important  special  case  of  this  theorem  is  worth  noting  explicitly: 


x + m 

|xj  + m 

and 

x + m" 

’ |Y|  + m 

n 

n 

rt 

n 

if  m and  n are  integers  and  the  denominator  n is  positive.  For  example,  let 
m = 0;  we  have  [[[x/lOJ/lOj  /10j  = [x/lOOOJ.  Dividing  thrice  by  10  and 
throwing  off  digits  is  the  same  as  dividing  by  1000  and  tossing  the  remainder. 

Let’s  try  now  to  prove  or  disprove  another  statement: 

In/WI  = h/xl  . real  x ^ 0. 

This  works  when  x = n and  x = e,  but  it  fails  when  x = (jr;  so  we  know  that 
it  isn’t  true  in  general. 

Before  going  any  further,  let’s  digress  a minute  to  discuss  different  “lev- 
els” of  questions  that  can  be  asked  in  books  about  mathematics: 

Level  1.  Given  an  explicit  object  x and  an  explicit  property  P(x),  prove  that 
P(x)  is  true.  For  example,  “Prove  that  [71J  = 3.”  Here  the  problem  involves 
finding  a proof  of  some  purported  fact. 

Level  2.  Given  an  explicit  set  X and  an  explicit  property  P(x),  prove  that 
P(x)  is  true  for  all  x fX.  For  example,  “Prove  that  |_xj  x for  all  real  x.” 
Again  the  problem  involves  finding  a proof,  but  the  proof  this  time  must  be 
general.  We’re  doing  algebra,  not  just  arithmetic. 

Level  3.  Given  an  explicit  set  X and  an  explicit  property  P(x),  prove  or 
disprove  that  P(x)  is  true  for  all  x 6 X.  For  example,  “Prove  or  disprove 
that  [ \/ |_xj  ] = for  all  real  x 0.”  Here  there’s  an  additional  level 

of  uncertainty;  the  outcome  might  go  either  way.  This  is  closer  to  the  real 
situation  a mathematician  constantly  faces:  Assertions  that  get  into  books 
tend  to  be  true,  but  new  things  have  to  be  looked  at  with  a jaundiced  eye.  If 
the  statement  is  false,  our  job  is  to  find  a counterexample.  If  the  statement 
is  true,  we  must  find  a proof  as  in  level  2. 

Level  4.  Given  an  explicit  set  X and  an  explicit  property  P(x),  find  a neces- 
sary and  sufficient  condition  Q(x)  that  P(x)  is  true.  For  example,  “Find  a 
necessary  and  sufficient  condition  that  |xj  |Y| .”  The  problem  is  to  find  Q 
such  that  P(x)  Q(x).  Of  course,  there’s  always  a trivial  answer;  we  can 
take  Q(x)  = P(x).  But  the  implied  requirement  is  to  find  a condition  that’s  as 
simple  as  possible.  Creativity  is  required  to  discover  a simple  condition  that 
will  work.  (For  example,  in  this  case,  “[xj  ^ [Y|  4 — 4 x is  an  integer.14)  The 
extra  element  of  discovery  needed  to  find  Q(x)  makes  this  sort  of  problem 
more  difficult,  but  it’s  more  typical  of  what  mathematicians  must  do  in  the 
“real  world!’  Finally,  of  course,  a proof  must  be  given  that  P(x)  is  true  if  and 
only  if  Q(x)  is  true. 


In  my  other  texts 

“prove  or  disprove” 
seems  to  mean  the 
same  as  “prove,” 
about  99.44%  of 
the  time;  but  not 
in  this  book. 


But  no  simpler. 

-A.  Einstein 
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Home  of  the 
Toledo  Mud/iens. 


(Or,  by  pessimists, 
half-dosed,) 


Level  5.  Given  an  explicit  set  X,  find  an  interesting  property  P(x)  of  its 
elements.  Now  we’re  in  the  scary  domain  of  pure  research,  where  students 
might  think  that  total  chaos  reigns.  This  is  real  mathematics.  Authors  of 
textbooks  rarely  dare  to  ask  level  5 questions. 

End  of  digression.  But  let’s  convert  our  last  question  from  level  3 to 
level  4:  What  is  a necessary  and  sufficient  condition  that  [^/  [xj  ] = [\/x]? 
We  have  observed  that  equality  holds  when  x = 3.142  but  not  when  x = 1.618; 
further  experimentation  shows  that  it  fails  also  when  X is  between  9 and  10. 
Oho.  Yes.  We  see  that  bad  cases  occur  whenever  m2  < x < m2  + 1 , since  this 
gives  m on  the  left  and  m + 1 on  the  right.  In  all  other  cases  where  0c  is 
defined,  namely  when  x = 0 or  m2  +1  <fx  <C  (m  Y 1 )2,  we  get  equality.  The 
following  statement  is  therefore  necessary  and  sufficient  for  equality:  Either 
x is  an  integer  or  0[xj  isn’t. 

For  our  next  problem  let’s  consider  a handy  new  notation,  suggested 
by  C.  A.  R.  Hoare  and  Lyle  Ramshaw,  for  intervals  of  the  real  line:  [a.  (3] 

denotes  the  set  of  real  numbers  x such  that  0C  ^ x ^ (3.  This  set  is  called 
a closed  interval  because  it  contains  both  endpoints  CX  and  (3.  The  interval 
containing  neither  endpoint,  denoted  by  (a.,  (3),  consists  of  all  x such  that 
(X  < x < |3;  this  is  called  an  open  interval.  And  the  intervals  [a..  (3)  and 
(a.  . (3],  which  contain  just  one  endpoint,  are  defined  similarly  and  called 
half-  open. 

How  many  integers  are  contained  in  such  intervals?  The  half-open  inter- 
vals are  easier,  so  we  start  with  them.  In  fact  half-open  intervals  are  almost 
always  nicer  than  open  or  closed  intervals.  For  example,  they’re  additive-we 
can  combine  the  half-open  intervals  fa. . (3 ) and  [(3  . . y)  to  form  the  half-open 
interval  [a.  . y).  This  wouldn’t  work  with  open  intervals  because  the  point  |3 
would  be  excluded,  and  it  could  cause  problems  with  closed  intervals  because 
(3  would  be  included  twice. 

Back  to  our  problem.  The  answer  is  easy  if  a and  (3  are  integers:  Then 
[a . . (3)  containsthe  |3  — a integers  a,  a-l- 1 , . . . . |3  — 1 , assuming  that  a ^ |3. 
Similarly  ( a.  • |3]  contains  (3  a integers  in  such  a case.  But  our  problem  is 
harder,  because  pt  and  (3  are  arbitrary  reals.  We  can  convert  it  to  the  easier 
problem,  though,  since 

a ^ n < |3  fa]  ^ n < [|3]  , 

a < n ^ |3  <(=!>  |_aj  < n ^ |_|3J  , 

when  n is  an  integer,  according  to  (3.7).  The  intervals  on  the  right  have 
integer  endpoints  and  contain  the  same  number  of  integers  as  those  on  the  left, 
which  have  real  endpoints.  So  the  interval  [a..  (3)  contains  exactly  f(3]  — fa] 
integers,  and  (a..  [3]  contains  f(3J  faj.  This  is  a case  where  we  actually 
want  to  introduce  floor  or  ceiling  brackets,  instead  of  getting  rid  of  them. 
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By  the  way,  there’s  a mnemonic  for  remembering  which  case  uses  floors 
and  which  uses  ceilings:  Half-open  intervals  that  include  the  left  endpoint 
but  not  the  right  (such  as  0 ^ 0 < 1)  are  slightly  more  common  than  those 
that  include  the  right  endpoint  but  not  the  left;  and  floors  are  slightly  more 
common  than  ceilings.  So  by  Murphy’s  Law,  the  correct  rule  is  the  opposite 
of  what  we’d  expect  -ceilings  for  [a  . . (3)  and  floors  for  (a. . (3], 

Similar  analyses  show  that  the  closed  interval  [a.  ■ |3]  contains  exactly 
L(3J  [ a]  + 1 integers  and  that  the  open  interval  ( a . . (3  ; contains  ["  |3]  — [aj  — 1 ; 
but  we  place  the  additional  restriction  a ^ (3  on  the  latter  so  that  the  formula 
won’t  ever  embarrass  us  by  claiming  that  an  empty  interval  (a.  . a)  contains 
a total  of  -1  integers.  To  summarize,  we’ve  deduced  the  following  facts: 


interval 

integers  contained 

restrictions 

[a.,  pi 

LPJ  - M + i 

a ^ P , 

[<*••  P) 

rpi  - m 

aO, 

(a-. pi 

LPJ  - L«J 

P, 

(a..p) 

fPl-LaJ-1 

a<  p. 

Now  here’s  a problem  we  can’t  refuse.  The  Concrete  Math  Club  has  a 
casino  (open  only  to  purchasers  of  this  book)  in  which  there’s  a roulette  wheel 
with  one  thousand  slots,  numbered  1 to  1000.  If  the  number  n that  comes  up 
on  a spin  is  divisible  by  the  floor  of  its  cube  root,  that  is,  if 

L^J  \ n, 

then  it’s  a winner  and  the  house  pays  us  $5;  otherwise  it’s  a loser  and  we 
must  pay  $1.  (The  notation  a\b,  read  “a  divides  b,”  means  that  b is  an  exact 
multiple  of  a;  Chapter  4 investigates  this  relation  carefully.)  Can  we  expect 
to  make  money  if  we  play  this  game? 

We  can  compute  the  average  winnings-that  is,  the  amount  we’ll  win 
(or  lose)  per  play-by  first  counting  the  number  W of  winners  and  the  num- 
bs L = 1000  ~ W of  losers.  If  each  number  comes  up  once  during  1000  plays, 
we  win  5W  dollars  and  lose  L dollars,  so  the  average  winnings  will  be 

5W-L  _ 5W - (1000  - W)  6W- 1000 
1000  “ 1000  = 1000  . 

If  there  are  167  or  more  winners,  we  have  the  advantage;  otherwise  the  ad- 
vantage is  with  the  house. 

How  can  we  count  the  number  of  winners  among  1 through  1 OOO?  It’s 
not  hard  to  spot  a pattern.  The  numbers  from  1 through  23  — 1 = 7 are  all 
winners  because  [v^J  = 1 for  each.  Among  the  numbers  23  = 8 through 
33  — 1 = 26,  only  the  even  numbers  are  winners.  And  among  33  = 27  through 
43  — 1 = 63,  only  those  divisible  by  3 are.  And  so  on. 


Just  like  we  can  re- 
member the  date  of 
Columbus’s  depar- 
t ure  by  singing,  “In 
fourteen  hundred 
and  ninety-three/ 
Columbus  sailed  the 
deep  blue  sea  ” 


[A  poll  of  the  class 
at  this  point  showed 
that  28  students 
thought  it  was  a 
bad  idea  to  play, 

13  wanted  to  gam- 
ble, and  the  rest 
were  too  confused 
to  answer.) 

(So  we  hit  them 
with  the  Concrete 
Math  Club.) 
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The  whole  setup  can  be  analyzed  systematically  if  we  use  the  summa- 
tion techniques  of  Chapter  2,  taking  advantage  of  Iverson's  convention  about 
logical  statements  evaluating  to  0 or  1: 


True. 

Where  did  you  say 
this  casino  is? 


W = L [n  is  a winner] 
n= 1 

= n ] = Y [k—  |_y/nj]  [k\n][1  1000] 


lSjn^lOOO  k,n 

= y_  [k3$n<(k+1)3][n  = km][1  ^n^lOOO] 


k,m,n 


1 + X.fk3  ^ km<  (k+  1 )3]  [1  ^ k<  10] 


k,m 


= 1 + ^T[me  [k2..(k+i)3/k)][Uk<10] 


k,m 

= 1+  X.  (T^2  + 3k  + 3 + 1/k]  — [k2]) 

lsCk<10 

= 1+  y (3k + 4)  = l+^ili-9  = 172. 

1s:k<10 

This  derivation  merits  careful  study.  Notice  that  line  6 uses  our  formula 
(3.12)  for  the  number  of  integers  in  a half-open  interval.  The  only  “difficult” 
maneuver  is  the  decision  made  between  lines  3 and  4 to  treat  n=  1000  as  a 
special  case.  (The  inequality  k3  <C  n < (k  + 1 )3  does  not  combine  easily  with 
1 5jn<?1000  when  k = 10.  ) In  general,  boundary  conditions  tend  to  be  the 
most  critical  part  of  x-manipulations. 

The  bottom  line  says  that  W = 172;  hence  our  formula  for  average  win- 
nings per  play  reduces  to  (6.172  — 1000)/1000  dollars,  which  is  3.2  cents.  We 
can  expect  to  be  about  $3.20  richer  after  making  100  bets  of  $1  each.  (Of 
course,  the  house  may  have  made  some  numbers  more  equal  than  others.) 

The  casino  problem  we  just  solved  is  a dressed-up  version  of  the  more 
mundane  question,  “How  many  integers  n,  where  1 $1  n <C  1000,  satisfy  the  re- 
lation [^/nj  \ n?”  Mathematically  the  two  questions  are  the  same.  But  some- 
times it’s  a good  idea  to  dress  up  a problem.  We  get  to  use  more  vocabulary 
(like  “winners”  and  “losers”),  which  helps  us  to  understand  what’s  going  on. 

Let’s  get  general.  Suppose  we  change  1000  to  1000000,  or  to  an  even 
larger  number,  N . (We  assume  that  the  casino  has  connections  and  can  get  a 
bigger  wheel.)  Now  how  many  winners  are  there? 

The  same  argument  applies,  but  we  need  to  deal  more  carefully  with  the 
largest  value  of  k,  which  we  can  call  K for  convenience: 


K = L^NJ 
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(Previously  K was  10.)  The  total  number  of  winners  for  general  N comes  to 

W = (3k  + 4)  + ^[K3^Km^N] 

i$k<K  m 

= l(7  + 3K  + l)(K-l)+^[m€[K2..N/K]] 

m 

= fK2  + |K-4  + ^[me[K2..N/K]]. 

m 

We  know  that  the  remaining  sum  is  [_N1  /KJ  — |"K2"|  + 1 = [N/KJ  — K2  + 1; 
hence  the  formula 

W = LN/Kj  + lK2  + |K-3,  K = L^NJ  (3.13) 

gives  the  general  answer  for  a wheel  of  size  N. 

The  first  two  terms  of  this  formula  are  approximately  N2^3  + jN2^  — 
3N2/3,  and  the  other  terms  are  much  smaller  in  comparison,  when  N is  large. 
In  Chapter  9 we’ll  learn  how  to  derive  expressions  like 

W = |N2/3  + 0(N1/3) , 

where  0(N1,/3)  stands  for  a quantity  that  is  no  more  than  a constant  times 
N123.  Whatever  the  constant  is,  we  know  that  it’s  independent  of  N;  so  for 
large  N the  contribution  of  the  O-term  to  W will  be  quite  small  compared 
with  3N2/3.  For  example,  the  following  table  shows  how  close  |N2^3  is  to  W: 


N 

|N2/3 

W 

% error 

1,000 

150,0 

172 

12,791 

10,000 

6 9 6, 2 

746 

6,670 

1 0 0,  0 0 0 

3 2 3 1, 7 

33  43 

3,331 

1,000,000 

1 5 0 0 0,  0 

15247 

1.620 

10,000,000 

69623.  8 

70158 

0,761 

100,000,000 

323165.  2 

324322 

0,357 

,000,000,000 

1500000.  0 

1502496 

0,  166 

It’s  a pretty  good  approximation. 

Approximate  formulas  are  useful  because  they’re  simpler  than  formu- 
las with  floors  and  ceilings.  However,  the  exact  truth  is  often  important, 
too,  especially  for  the  smaller  values  of  N that  tend  to  occur  in  practice. 
For  example,  the  casino  owner  may  have  falsely  assumed  that  there  are  only 
= 150  winners  when  N = 1000  (in  which  case  there  would  be  a 10^ 
advantage  for  the  house). 
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. ■ ■ without  lots 

of  generality.  . 


“If  x be  an  in- 
commensurable 
number  less  than 
unity,  one  of  the 
series  of  quantities 
m/x,  m/(l  — x) , 
where  m is  a whole 
number,  can  be 
found  which  shall 
he  between  any 
given  consecutive 
integers,  and  but 
one  such  quantity 
can  be  found.” 

— Rayleigh  [245] 


Right,  because 
exact/y  one  of 
the  counts  must 
increase  when  n 
increases  by  1 . 


Our  last  application  in  this  section  looks  at  so-called  spectra.  We  define 
the  spectrum  of  a real  number  a to  be  an  infinite  multiset  of  integers, 

Spec(a)  = {|_a|,  |_2aJ,  [3aJ,  . . .}. 

(A  multiset  is  like  a set  but  it  can  have  repeated  elements.)  For  example,  the 
spectrum  of  1 /2  starts  out  {0, 1 , 1 , 2, 2, 3, 3, . . . }. 

It’s  easy  to  prove  that  no  two  spectra  are  equal-that  a yL  (3  implies 
Spec  (a)  ^ Spec  ( (3).  For,  assuming  without  loss  of  generality  that  a < (3, 
there’s  a positive  integer  m such  that  m(  (3  — a)  ^ 1.  (In  fact,  any  m ^ 
[1  /( |3  - a)  | will  do;  but  we  needn’t  show  off  our  knowledge  of  floors  and 
ceilings  all  the  time.)  Hence  m(3  — ma  1,  and  |_m|3J  > [maj . Thus 
Spec(P)  has  fewer  than  m elements  ^ [maj,  while  Spec  (a)  has  at  least  m. 

Spectra  have  many  beautiful  properties.  For  example,  consider  the  two 
multisets 


Spec(v/2)  = {1,2,4,5,7,8,9,11,12,14,15,16,18,19,21,22,24 
Spec(2  + \fl ) = {3,6,10,13,17,20,23,27,30,34,37,40,44,47,51,...  }. 

It’s  easy  to  calculate  Spec(  \J~2. ) with  a pocket  calculator,  and  the  nth  element 
of  Spec (2+  sjl ) is  just  2n  more  than  the  nth  element  of  Spec(\/2 ),  by  (3.6). 
A closer  look  shows  that  these  two  spectra  are  also  related  in  a much  more 
surprising  way:  It  seems  that  any  number  missing  from  one  is  in  the  other, 
but  that  no  number  is  in  both!  And  it’s  true:  The  positive  integers  are  the 
disjoint  union  of  Spec(  \fl ) and  Spec (2+  \fl ).  We  say  that  these  spectra  form 
a partition  of  the  positive  integers. 

To  prove  this  assertion,  we  will  count  how  many  of  the  elements  of 
Spec(\/2 ) are  ^ n,  and  how  many  of  the  elements  of  Spec(2  + x/2 ) are  ^ n.  If 
the  total  is  n,  for  each  n,  these  two  spectra  do  indeed  partition  the  integers. 
Let  a be  positive.  The  number  of  elements  in  Spec  (a)  that  are  ^ n is 

N(a,n)  = y~  \ [kaj  ^n] 

k>0 

= Y \ [kaj  <n  + 1] 

k>0 

= ^[ka<n+  1] 

k>0 

- [0<k<  (n+  1)/a] 

k 

= f(n  + 1)/a|  - 1 . (3.14) 
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This  derivation  has  two  special  points  of  interest.  First,  it  uses  the  law 


m si  n <=>  m < n + 1 , integers  m and  n (3.15) 


to  change  ‘<+  to  so  that  the  floor  brackets  can  be  removed  by  (3.7). 
Also  -and  this  is  more  subtle  -it  sums  over  the  range  k > 0 instead  of  k 1 , 
because  (n  + 1 )/a  might  be  less  than  1 for  certain  n and  a.  If  we  had  tried 
to  apply  (3.12)  to  determine  the  number  of  integers  in  [1  . . (n+  l)/a),  rather 
than  the  number  of  integers  in  (0..  (n+  1 )/a),  we  would  have  gotten  the  right 
answer;  but  our  derivation  would  have  been  faulty  because  the  conditions  of 
applicability  wouldn’t  have  been  met. 

Good,  we  have  a formula  for  N (a,  n).  Now  we  can  test  whether  or  not 
Spec(  a/2  ) and  Spec (2+  y/2  ) partition  the  positive  integers,  by  testing  whether 
or  not  N(\/2,  n)  + N(2  + y/2,  n)  = rt  for  all  integers  n > 0,  using  (3.14): 


n + 1 

V2 


-1  + 


n+1 

2 + V2 


-1 


n 


n + 1 

4_ 

n + 1 

L vi : 

r 

.2  + 71 

n + 1 

f n + 1 ] 

n + 1 

r n + 1 1 

72 

l 72  J 

2 + V2 

12+^2/ 

Everything  simplifies  now  because  of  the  neat  identity 

1 1 

—7=  3 7=  — 1 I 

s/2  2 + V2 

our  condition  reduces  to  testing  whether  or  not 


= 1, 


by  (3-2); 

n,  by  (3.8). 


for  all  n > 0.  And  we  win,  because  these  are  the  fractional  parts  of  two 
noninteger  numbers  that  add  up  to  the  integer  n + 1 . A partition  it  is. 


3.3  FLOOR/CEILING  RECURRENCES 

Floors  and  ceilings  add  an  interesting  new  dimension  to  the  study 
of  recurrence  relations.  Let’s  look  first  at  the  recurrence 

K°=1i  , 

Kn+1  = 1 + min(2KLn/2j,3KLn/3j) , for  n ^ 0.  3'1 

Thus,  for  example,  K;  is  1 + min(2Ko,3Ko)  = 3;  the  sequence  begins  1,  3,  3, 
4,  I,  I,  I,  9,  9,  10,  13,  ...  . One  of  the  authors  of  this  book  has  modestly 
decided  to  call  these  the  Knuth  numbers. 
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Exercise  25  asks  for  a proof  or  disproof  that  Kn  ^ n,  for  all  n 0.  The 
first  few  K’s  just  listed  do  satisfy  the  inequality,  so  there’s  a good  chance  that 
it’s  true  in  general.  Let’s  try  an  induction  proof:  The  basis  n = 0 conies 
directly  from  the  defining  recurrence.  For  the  induction  step,  we  assume 
that  the  inequality  holds  for  all  values  up  through  some  fixed  nonnegative  n, 
and  we  try  to  show  that  Kn+i  n + 1.  From  the  recurrence  we  know  that 
Kn+1  = 1 + min(2K^n/2j  , 3K|n/3j  ).  The  induction  hypothesis  tells  us  that 
2 K [n./2J  3 2[n/2J  and  3K|_n/3j  :>  3 |n/3J . However,  2|n/2J  can  be  as  small 
as  n 1,  and  3 |jn./3J  can  be  as  small  as  n — 2.  The  most  we  can  conclude 
from  our  induction  hypothesis  is  that  Kn+i  j>  1 + (n  — 2);  this  falls  far  short 
of  Kn+i  ^ n + 1. 

We  now  have  reason  to  worry  about  the  truth  of  Kn  ^ n,  so  let’s  try  to 
disprove  it.  If  we  can  find  an  n such  that  either  2X^/2]  < n or  3K^n  /3j  < n, 
or  in  other  words  such  that 

K|n/2j  < ^t/2  or  KLn/3J  < n/3 , 

we  will  have  Kn+i  < n + 1.  Can  this  be  possible?  We’d  better  not  give  the 
answer  away  here,  because  that  will  spoil  exercise  25. 

Recurrence  relations  involving  floors  and/or  ceilings  arise  often  in  com- 
puter science,  because  algorithms  based  on  the  important  technique  of  “divide 
and  conquer’’  often  reduce  a problem  of  size  n to  the  solution  of  similar  prob- 
lems of  integer  sizes  that  are  fractions  of  n.  For  example,  one  way  to  sort 
n records,  if  n > 1 , is  to  divide  them  into  two  approximately  equal  parts,  one 
of  size  |"n/2]  and  the  other  of  size  [_n./2J . (Notice,  incidentally,  that 

n = \n/2~\  + |n/2J  ; (3.17) 

this  formula  comes  in  handy  rather  often.)  After  each  part  has  been  sorted 
separately  (by  the  same  method,  applied  recursively),  we  can  merge  the 
records  into  their  final  order  by  doing  at  most  n — 1 further  comparisons. 
Therefore  the  total  number  of  comparisons  performed  is  at  most  f(n),  where 


J ~ ( R 

f(n)  = f(fn/21)  + f(|n/2J)+n-1 , for  n > l 13 

A solution  to  this  recurrence  appears  in  exercise  34. 

The  Josephus  problem  of  Chapter  1 has  a similar  recurrence,  which  can 
be  cast  in  the  form 


Jd)  = i; 

J(n)  = 2 J ( |n/2J ) - (-1)”  , 


for  n > 1. 
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We’ve  got  more  tools  to  work  with  than  we  had  in  Chapter  1,  so  let’s 
consider  the  more  authentic  Josephus  problem  in  which  every  third  person  is 
eliminated,  instead  of  every  second.  If  we  apply  the  methods  that  worked  in 
Chapter  1 to  this  more  difficult  problem,  we  wind  up  with  a recurrence  like 


l3(n)  = [§J3(Lfn|)  + an 


mod  n + 1 , 


where  ‘mod’  is  a function  that  we  will  be  studying  shortly,  and  where  we  have 
a„  = -2,  +1  , or  — j according  as  n mod  3 = 0,  1,  or  2.  But  this  recurrence 
is  too  horrible  to  pursue. 

There’s  another  approach  to  the  Josephus  problem  that  gives  a much 
better  setup.  Whenever  a person  is  passed  over,  we  can  assign  a new  number. 
Thus,  1 and  2 become  n + 1 and  n + 2,  then  3 is  executed;  4 and  5 become 
n + 3 and  n -f-  4,  then  6 is  executed;  . . . ; 3k+  1 and  3k + 2 become  u + 2k+  1 
and  n + 2k  + 2,  then  3k  + 3 is  executed;  . . . then  3n  is  executed  (or  left  to 
survive).  For  example,  when  n = 10  the  numbers  are 


1 2 3 

4 

5 6 

7 8 

9 10 

11  12 

13 

14 

15  16 

17 

n 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

The  kth  person  eliminated  ends  up  with  number  3k.  So  we  can  figure  out  who 
the  survivor  is  if  we  can  figure  out  the  original  number  of  person  number  3n. 

If  N > n,  person  number  N must  have  had  a previous  number,  and  we 

can  find  it  as  follows:  We  have  N = n -f  2k  + 1 orN  = n + 2k  + 2,  hence 

k = |_(N  — n — 1)/2J  ; the  previous  number  was  3k  + 1 or  3k  + 2,  respectively. 

That  is,  it  was  3k  + (N  — n — 2k)  = k + N — n.  Hence  we  can  calculate  the 

survivor’s  number  J3  (n)  as  follows: 


N :=  3n; 

while  N > n do  N:= - 

J3(n)  :=  N. 


+N-n; 


This  is  not  a closed  form  for  J3(n);  it’s  not  even  a recurrence.  But  at  least  it 
tells  us  how  to  calculate  the  answer  reasonably  fast,  if  n is  large. 


"Not  too  slow, 

not  too  fast.” 

- L . Armstrong 
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Fortunately  there’s  a way  to  simplify  this  algorithm  if  we  use  the  variable 
D = 3n  + 1 — N in  place  of  N.  (This  change  in  notation  corresponds  to 
assigning  numbers  from  3n  down  to  1,  instead  of  from  1 up  to  3n;  it’s  sort  of 
like  a countdown.)  Then  the  complicated  assignment  to  N becomes 


D : = 3n  + 1 - 

= n + D — 


(3n  + 1 - D)  — rt  - 1 


+ (3n  + 1 - D)  - n 


2n  — D 


= D 


-D 


= D + 


= m> 


and  we  can  rewrite  the  algorithm  as  follows: 


D :=  1; 

while  D <C  2n  do  D : = [jD] ; 

J3(n)  : = 3n+1  - D . 

Aha!  This  looks  much  nicer,  because  rt  enters  the  calculation  in  a very  simple 
way.  In  fact,  we  can  show  by  the  same  reasoning  that  the  survivor  Jq  (n)  when 
every  qth  person  is  eliminated  can  be  calculated  as  follows: 


D :=  1; 

while  D <C  ( q - l)ndo  D:=  (319) 

J,(n)  :=  qn  + 1 - D , 


In  the  case  q = 2 that  we  know  so  well,  this  makes  D grow  to  2m+1  when 
n = 2m  + l;  hence  J2  (n)  = 2(2m  + l)  + 1 - 2m+1  = 21  + 1 . Good. 

The  recipe  in  (3.19)  computes  a sequence  of  integers  that  can  be  defined 
by  the  following  recurrence: 


''Known”  like,  say, 
harmonic  numbers. 
A.  M.  Odlyzko  and 
H.  S.  Wilf  have 
shown  that 

d<3)=  Kfrcj, 

where 

Cm  1,622270503, 


Dj,q)  = 
D<q)  = 


for  rt  > 0. 


(3-2o) 


These  numbers  don’t  seem  to  relate  to  any  familiar  functions  in  a simple 
way,  except  when  q = 2;  hence  they  probably  don’t  have  a nice  closed  form. 
But  if  we’re  willing  to  accept  the  sequence  as  “known,”  then  it’s  easy  to 
describe  the  solution  to  the  generalized  Josephus  problem:  The  survivor  Jq  (n) 
is  qn  T-  1 — D q':  , where  k is  as  small  as  possible  such  that  D Lql  > (q  - 1)n. 


3.4  'MOD':  THE  BINARY  OPERATION 

The  quotient  of  n.  divided  by  m is  |_n/mj , when  m and  p are  positive 
integers.  It’s  handy  to  have  a simple  notation  also  for  the  remainder  of  this 
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division,  and  we  call  it  ‘n  mod  m’.  The  basic  formula 


n 


quotient 


tells  us  that  we  can  express  n mod  masn-  m[rt/TaJ  • We  can  generalize  this 
to  negative  integers,  and  in  fact  to  arbitrary  real  numbers: 

xmody  = x — y[x/xjJ,  for  y ^ 0.  (3.21) 

This  defines  "mod’  as  a binary  operation,  just  as  addition  and  subtraction  are 
binary  operations.  Mathematicians  have  used  mod  this  way  informally  for  a 
long  time,  taking  various  quantities  mod  10,  mod  2n,  and  so  on,  but  only  in 
the  last  twenty  years  has  it  caught  on  formally.  Old  notion,  new  notation. 

We  can  easily  grasp  the  intuitive  meaning  of  x mod  y,  when  x and  y 
are  positive  real  numbers,  if  we  imagine  a circle  of  circumference  y whose 
points  have  been  assigned  real  numbers  in  the  interval  [0  . . y).  If  we  travel  a 
distance  x around  the  circle,  starting  at  0,  we  end  up  at  x mod  y.  (And  the 
number  of  times  we  encounter  0 as  we  go  is  [x/yj  •) 

When  x or  y is  negative,  we  need  to  look  at  the  definition  carefully  in 
order  to  see  exactly  what  it  means.  Here  are  some  integer-valued  examples: 


5 mod  3 = 5-3  [5/3J  = 2 ; 

5 mod  -3  = 5 — (— 3)  |_5/(— 3)J  = -1  ; 

-5  mod  3 = -5  — 3 1_ — 5/3J  = 1; 

-5  mod  -3  = -5  -(-3)  |_-5/(-3)J  = -2. 


The  number  after  "mod’  is  called  the  modulus;  nobody  has  yet  decided  what 
to  call  the  number  before  "mod’.  In  applications,  the  modulus  is  usually 
positive,  but  the  definition  makes  perfect  sense  when  the  modulus  is  negative. 
In  both  cases  the  value  of  x mod  y is  between  0 and  the  modulus: 

0 <C  x m o d y < y , for  y > 0; 

0 ^ xmody  > y , for  y < 0. 

What  about  y = O?  Definition  (3.21)  leaves  this  case  undefined,  in  order  to 
avoid  division  by  zero,  but  to  be  complete  we  can  define 

X mod  0 = x . (3-22) 

This  convention  preserves  the  property  that  x mod  y always  differs  from  x by 
a multiple  of  y.  (It  might  seem  more  natural  to  make  the  function  continuous 
at  0,  by  defining  x mod  0 = lim-y_,o  x niod  y = 0.  But  we’ll  see  in  Chapter  4 


Why  do  they  call  it 
'mod':  The  Binary 
Operation?  Stay 
tuned  to  find  out  in 
the  next,  exciting, 
chapter! 


Beware  of  computer 
languages  that  use 
another  definition, 


How  about  calling 
the  other  number 
the  modumor? 
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There  was  a time  In 
the  70s  when  'mod' 
was  the  fashion. 
Maybe  the  new 
mumble  function 
should  be  called 
‘punk’? 

No-1  like 
‘mumble'. 


The  remainder,  eh? 


that  this  would  be  much  less  useful.  Continuity  is  not  an  important  aspect 
of  the  mod  operation.) 

We’ve  already  seen  one  special  case  of  mod  in  disguise,  when  we  wrote  x 
in  terms  of  its  integer  and  fractional  parts,  x = [xj  + {x}.  The  fractional  part 
can  also  be  written  x mod  1,  because  we  have 

x = [_xj  + x mod  1 . 

Notice  that  parentheses  aren’t  needed  in  this  formula;  we  take  mod  to  bind 
more  tightly  than  addition  or  subtraction. 

The  floor  function  has  been  used  to  define  mod,  and  the  ceiling  function 
hasn’t  gotten  equal  time.  We  could  perhaps  use  the  ceiling  to  define  a mod 
analog  like 

xmumbley  = yfx/y]  — x; 

in  our  circle  analogy  this  represents  the  distance  the  traveler  needs  to  continue, 
after  going  a distance  x,  to  get  back  to  the  starting  point  0.  But  of  course 
we’d  need  a better  name  than  ‘mumble’.  If  sufficient  applications  come  along, 
an  appropriate  name  will  probably  suggest  itself. 

The  distributive  law  is  mod’s  most  important  algebraic  property:  We 

have 


c(x  mod  y)  = (cx)  mod  (cy)  (3-23) 

for  all  real  c,  x,  and  y.  (Those  who  like  mod  to  bind  less  tightly  than  multi- 
plication may  remove  the  parentheses  from  the  right  side  here,  too.)  It’s  easy 
to  prove  this  law  from  definition  (3.21),  since 

c(x  mod  y ) = c(x  — y [x/yj ) = cx  — cy  [cx/cyj  = cx  mod  cy  , 

if  cy  7^  0;  and  the  zero-modulus  cases  are  trivially  true.  Our  four  examples 
using  ±5  and  ±3  illustrate  this  law  twice,  with  c = -1.  An  identity  like 
(3.23)  is  reassuring,  because  it  gives  us  reason  to  believe  that  ‘mod’  has  not 
been  defined  improperly. 

In  the  remainder  of  this  section,  we’ll  consider  an  application  in  which 
‘mod’  turns  out  to  be  helpful  although  it  doesn’t  play  a central  role.  The 
problem  arises  frequently  in  a variety  of  situations:  We  want  to  partition 
n things  into  m groups  as  equally  as  possible. 

Suppose,  for  example,  that  we  have  n short  lines  of  text  that  we’d  like 
to  arrange  in  m columns.  For  aesthetic  reasons,  we  want  the  columns  to  be 
arranged  in  decreasing  order  of  length  (actually  nonincreasing  order);  and  the 
lengths  should  be  approximately  the  same-no  two  columns  should  differ  by 
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more  than  one  line’s  worth  of  text.  If  37  lines  of  text  are  being  divided  into 
five  columns,  we  would  therefore  prefer  the  arrangement  on  the  right: 


8 

8 

8 

8 

5 

8 

8 

7 

7 

7 

ne  1 

lined  1 

ne  17 

line  25 

line  33 

ne  1 1 

ne  9 

line  17 

line  24 

line  31 

ne  2 

linelO  1 

ne  18 

line  26 

line  34 

ne  2 1 

ne  10 

line  18 

line  25 

line  32 

ne  3 

linell  1 

ne  19 

line  27 

line  35 

ne  3 1 

ne  1 1 

line  19 

line  26 

line  33 

ne  4 

line  12  1 

ne  20 

line  28 

line  36 

ne4  1 

ne  12 

line  20 

line  27 

line  34 

ne  5 

line  13  1 

ne  21 

line  29 

line  37 

ne  5 1 

ne  13 

line  21 

line  28 

line  35 

ne  6 

line  14  1 

ne  22 

line  30 

ne  6 1 

ne  14 

line  22 

line  29 

line  36 

ne  7 

line  15  1 

ne  23 

line  31 

ne  7 1 

ne  15 

line  23 

line  30 

line  37 

ne  8 

line  1 6 I 

ne  24 

line  32 

ne  8 1 

ne  16 

Furthermore  we  want 

to  distribute  the  lines  of  text  columnwise-first  decid- 

ing  how 

many  lines  go 

into 

the  first 

column 

and  then 

moving 

on  to 

the  second 

the  third,  and  so  on-because  that’s  the  way  people  read.  Distributing  row 
by  row  would  give  us  the  correct  number  of  lines  in  each  column,  but  the 
ordering  would  be  wrong.  (We  would  get  something  like  the  arrangement  on 
the  right,  but  column  1 would  contain  lines  1,  6,  1 1,  . . . , 36,  instead  of  lines 
1,  ,,8  as  desired.) 

A row-by-row  distribution  strategy  can’t  be  used,  but  it  does  tell  us  how 
many  lines  to  put  in  each  column.  If  n is  not  a multiple  of  m,  the  row- 
by-row  procedure  makes  it  clear  that  the  long  columns  should  each  contain 
[n/ml  lines,  and  the  short  columns  should  each  contain  |_n/mj.  There  will 
be  exactly  n mod  m long  columns  (and,  as  it  turns  out,  there  will  be  exactly 
n mumble  m short  ones). 

Let’s  generalize  the  terminology  and  talk  about  ‘things’  and  ‘groups’ 
instead  of  Tines’  and  ‘columns’.  We  have  just  decided  that  the  first  group 
should  contain  [n/m]  things;  therefore  the  following  sequential  distribution 
scheme  ought  to  work:  To  distribute  n things  into  m groups,  when  m > 0, 
put  [rt/m]  things  into  one  group,  then  use  the  same  procedure  recursively  to 
put  the  remaining  n’  = n-  [n/ml  things  into  m’  = m-  1 additional  groups. 

For  example,  if  n = 314  and  m = 6,  the  distribution  goes  like  this: 

remaining  things  remaining  groups  [things/groups] 


314 

6 

53 

261 

5 

53 

208 

4 

52 

156 

3 

52 

104 

2 

52 

52 

1 

52 

It  works.  We  get  groups  of  approximately  the  same  size,  even  though  the 
divisor  keeps  changing. 

Why  does  it  work?  In  general  we  can  suppose  that  n = qm  + r,  where 
q = [n/mj  and  r = n mod  m.  The  process  is  simple  if  r = 0:  We  put 
[n/m]  = q things  into  the  first  group  and  replace  n by  n’  = n q,  leaving 
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n’  = qm’  things  to  put  into  the  remaining  nT  = m — 1 groups.  And  if 
r > 0,  we  put  [n/m]  = q + 1 things  into  the  first  group  and  replace  n 
by  n’  = n — q 1 , leaving  n’  = qm’  + r 1 things  for  subsequent  groups. 
The  new  remainder  is  x'  = T — 1 , but  q stays  the  same.  It  follows  that  there 
will  be  r groups  with  q + 1 things,  followed  by  m r groups  with  q things. 

How  many  things  are  in  the  kth  group?  We’d  like  a formula  that  gives 
[n/m]  when  k n mod  m,  and  [n/mj  otherwise.  It’s  not  hard  to  verify 
that 

"n  — k + T 
m 


has  the  desired  properties,  because  this  reduces  to  q + [(r  k + 1 )/ml  if  we 
write  n = qm  + r as  in  the  preceding  paragraph;  here  q = [u/mj.  We  have 
[(r  — k+  l)/m]=  [k<]  r],  if  1 ^k  ^ m and  0 r < m.  Therefore  we  can 
write  an  identity  that  expresses  the  partition  of  n into  m as-equal-as-possible 
parts  in  nonincreasing  order: 


n = 


'n-T 

+ • 

• + 

"n  — m + 1" 

m 

m 

(3-24) 


This  identity  is  valid  for  all  positive  integers  m,  and  for  all  integers  n (whether 
positive,  negative,  or  zero).  We  have  already  encountered  the  case  m = 2 in 
(3.17),  although  we  wrote  it  in  a slightly  different  form,  n = [n/2]  Y Ln/2J . 

If  we  had  wanted  the  parts  to  be  in  nondecreasing  order,  with  the  small 
groups  coming  before  the  larger  ones,  we  could  have  proceeded  in  the  same 
way  but  with  [n/m.  j things  in  the  first  group.  Then  we  would  have  derived 
the  corresponding  identity 


n 


n 

m 


+ 


n+1 

m 


+ •••  + 


n + m—  1 
m 


(3-25) 


Some  c/aim  that  it’s 
too  dangerous  to 
replace  anything  by 
an  mx. 


It’s  possible  to  convert  between  (3.25)  and  (3.24)  by  using  either  (3.4)  or  the 
identity  of  exercise  12. 

Now  if  we  replace  n in  (3.25)  by  [mxj  , and  apply  rule  (3.11)  to  remove 
floors  inside  of  floors,  we  get  an  identity  that  holds  for  all  real  x: 


[mxj  = 


1 1 

m — 1 

x"tT  + 
L mJ 

x -I 

m 

(3-26) 


This  is  rather  amazing,  because  the  floor  function  is  an  integer  approximation 
of  a real  value,  but  the  single  approximation  on  the  left  equals  the  sum  of  a 
bunch  of  them  on  the  right.  If  we  assume  that  [xj  is  roughly  x - [ on  the 
average,  the  left-hand  side  is  roughly  mx  — j , while  the  right-hand  side  comes 

toroughly  (x  — 5)  + (x  — j + H f (x  — j + =mx— /;  thesumof 

all  these  rough  approximations  turns  out  to  be  exact! 
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3.5  FLOOR/CEILING  SUMS 

Equation  (3.26)  demonstrates  that  it’s  possible  to  get  a closed  form 
for  at  least  one  kind  of  sum  that  involves  |_  J-  Are  there  others?  Yes.  The 
trick  that  usually  works  in  such  cases  is  to  get  rid  of  the  floor  or  ceiling  by 
introducing  a new  variable. 

For  example,  let’s  see  if  it’s  possible  to  do  the  sum 

L 

0$k<n 

in  closed  form.  One  idea  is  to  introduce  the  variable  m = [\/k];  we  can  do 
this  “mechanically”  by  proceeding  as  we  did  in  the  roulette  problem: 

m[k<n][m=  [v/kj] 

k,m^0 

m[k<  n]  [m^  \/k<  m + 1 

k,m^0 

m[k<  n]  [m2  ^ k<  (m  + 1 )2] 

k,m^0 

m[m2  ^k<  (m+  l)2  ^n] 

k,m^0 

+ 22  rn.[m2  $)k<n<  (m+ l)2] 

k,m^0 

Once  again  the  boundary  conditions  are  a bit  delicate.  Let’s  assume  first  that 
n = q2  is  a perfect  square.  Then  the  second  sum  is  zero,  and  the  first  can  be 
evaluated  by  our  usual  routine: 

y m[m2  $k<  (m  + I)2  ^ a2] 
k,rrO0 

= Y m((m+  I)2  - m2)[m+  1 ^ a] 

= y m(2m  + 1)[m<  a] 

= y (2m2  + 3m-)[m<  a] 

m^O 

= (2m-  + 3m-)  6m 

= §a(a- l)(a- 2)  + §a(a-  1)  = |(4a+ l)a(a~  1) 


Falling  powers 
make  the  sum  come 
tumbling  down. 


L Lv^J  = 

0$k<rt 
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In  the  general  case  we  can  let  a = i then  we  merely  need  to  add 

the  terms  for  a2  <{  k < n,  which  are  all  equal  to  a,  so  they  sum  to  (n  — cr  ja. 
This  gives  the  desired  closed  form, 

Y_  LVicJ  = na- la3  - la2  - |a,  a=  L>AtJ  • (3-27) 

0^k<n 


Another  approach  to  such  sums  is  to  replace  an  expression  of  the  form 
|xj  by  j [1  ^ x] ; this  is  legal  whenever  x ^ 0.  Here’s  how  that  method 

works  in  the  sum  of  [square  roots],  if  we  assume  for  convenience  that  n = a2: 

X.  = Y n -A][0^k<  a2] 

OSskcu  j,k 

= y_  y [j2^k<a2] 

1<Jj<a  k 

= Y_  (a2  — j2)  = a3  |a(a+ l)(a+ 1). 

1$j<a 


Now  here’s  another  example  where  a change  of  variable  leads  to  a trans- 
formed sum.  A remarkable  theorem  was  discovered  independently  by  three 
mathematicians-  Bohl  [28],  Sierpinski  [265],  and  Weyl  [300]  -at  about  the 
same  time  in  1909:  If  a is  irrational  then  the  fractional  parts  {na}  are  very  uni- 
formly distributed  between  0 and  1 , as  rt  — > 00.  One  way  to  state  this  is  that 


lim  — 

n— >oc  TV 


Y_  f((ka} 


OgkCn 


f 1 

f ( x ) dx 

Jo 


(3-28) 


Warning:  This  stuff 
is  fairly  advanced. 
Better  skim  the 
next  two  pages  on 
first  reading;  they 


aren't  crucial. 

-Friendly  TA 


Start 

Skimming 


for  all  irrational  a and  all  functions  f that  are  continuous  almost  everywhere. 
For  example,  the  average  value  of  {na}  can  be  found  by  setting  f ( x ) = x ; we 
get  j.  (That’s  exactly  what  we  might  expect;  but  it’s  nice  to  know  that  it  is 
really,  provably  true,  no  matter  how  irrational  a is.) 

The  theorem  of  Bohl,  Sierpinski,  and  Weyl  is  proved  by  approximating 
f ( x ) above  and  below  by  “step  functions,’  which  are  linear  combinations  of 
the  simple  functions 

f"(X)  = [O^xcv] 

when  0 ^ v <C  1.  Our  purpose  here  is  not  to  prove  the  theorem;  that’s  a job 
for  calculus  books.  But  let’s  try  to  figure  out  the  basic  reason  why  it  holds, 
by  seeing  how  well  it  works  in  the  special  case  f ( x ) = f , , ( x ) . In  other  words, 
let’s  try  to  see  how  close  the  sum 


Y_  [{ka}  < v 


gets  to  the  “ideal’’  value  nv,  when  n is  large  and  a is  irrational. 
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For  this  purpose  we  define  the  discrepancy  D(a,  n)  to  be  the  maximum 
absolute  value,  over  all  0 v <C  1 , of  the  sum 

s(a,n,v)  = Y_  ( [{lea} < v]  -v).  (3.29) 

0$k<n 

Our  goal  is  to  show  that  D(  a,  n)  is  “not  too  large”  when  compared  with  n, 
by  showing  that  |s(a,  n.,v)|  is  always  reasonably  small. 

First  we  can  rewrite  s(a,  n,v)  in  simpler  form,  then  introduce  a new 
index  variable  j: 


Y_  ([{ka}<v]  - v ) = Y_  ([ka]  -(ka-vj-v) 

0^k<n  o^k<n 

= -„v+  XL  [ka  — v < j ^ ka] 

O^kcn  j 

= -"II  [ja  1 ^k<  (i  + v)a  ’]  . 

0^j<fna]  k i n 


If  we’re  lucky,  we  can  do  the  sum  on  k.  But  we  ought  to  introduce  some 
new  variables,  so  that  the  formula  won’t  be  such  a mess.  Without  loss  of 
generality,  we  can  assume  that  0 < a < 1 ; let  us  write 

a = La_1J  > 0T1  = a + a' ; 

b = [va-1]  , va-1  = b -v\ 


Right,  name  and 
conquer. 

The  change  of  vari- 
able from  k to  j is 
the  main  point. 

— Friendly  TA 


Thus  a’  — {a-1}  is  the  fractional  part  of  a-‘,  and  v’  is  the  mumble-fractional 
part  of  va -1 . 

Once  again  the  boundary  conditions  are  our  only  source  of  grief.  For 
now,  let’s  forget  the  restriction  ‘k  < n’  and  evaluate  the  sum  on  k without  it: 


Y [jot  1 . . (j  + v)a  ’) 


f(j  + v)(a  -f  a')]  — [j(a  + a')"| 
b+  f ja'-v'l  - fia'l  • 


OK,  that’s  pretty  simple;  we  plug  it  in  and  plug  away: 

s(a,n,v)  = - n v + [na]b+  ^ ([ja'-v']  M0^!)  -S,  (330) 

0$j<inal 

where  S is  a collection  fer  the  cases  with  k n that  we  have  failed  to  exclude. 
The  quantity  ja'  will  never  be  an  integer,  since  a (hence  a')  is  irrational;  and 
ja’  — v1  will  be  an  integer  for  at  most  one  value  of  j.  So  we  can  change  the 


3.5  FLOOR/CEILING  SUMS  89 


(The  formula 
[0  or  1 ] stands 
for  something  that's 
either  0 or  1 ; we 
needn't  commit 
ourselves,  because 
the  details  don't 
really  matter.) 


Stop 

Skimming 


ceiling  terms  to  floors: 

s(a,n,v)  = — nv  4-  |~ncx~|b  - ([jft'J  - [jaf— v'j)  — S + [0  or  1 1 . 

0^j<  [naf| 

Interesting.  Instead  of  a closed  form,  we’re  getting  a sum  that  looks  rather 
like  s(a,  n,  v)  but  with  different  parameters:  oc'  instead  of  a,  [na]  instead 
of  n,  and  v’  instead  of  v.  So  we’ll  have  a recurrence  for  s(  £X,  n,v),  which 
(hopefully)  will  lead  to  a recurrence  for  the  discrepancy  D (a,  n).  This  means 
we  want  to  get 

s(<x\  [na]  ,v')  = (L)a'J  Ua'  ~v'\  -v ') 

0^j<  |"na"] 


into  the  act: 

s(a,n,v)  = - n v + [na]b  — [najv'  — s(a',  [na], v')  — S + [0  or  1] , 

Recalling  that  b — v'=  va  1 , we  see  that  everything  will  simplify  beautifully 
if  we  replace  [na]  (b  — v’)  by  na(b  — v')  = nv: 

s(a,n,v)  = -s(a\  [na],v')  -S  + e + [0  or  11. 

Here  e is  a positive  error  of  at  most  va-1.  Exercise  18  proves  that  S is, 
likewise,  between  0 and  a-1 . We  can  also  remove  the  term  for  j = [na]  — 1 = 
[naj  from  the  sum,  since  it  contributes  either  v’  or  v’  — 1.  Hence,  if  we  take 
the  maximum  of  absolute  values  over  all  v,  we  get 

D(a,n)  <[  D(a',  [anj]  + a'1  $ 2 . (3.31) 

The  methods  we’ll  learn  in  succeeding  chapters  will  allow  us  to  conclude 
from  this  recurrence  that  D(a,  n]  is  always  much  smaller  than  n,  when  n is 
sufficiently  large.  Hence  the  theorem  (3.28)  is  not  only  true,  it  can  also  be 
strengthened:  Convergence  to  the  limit  is  very  fast. 

Whew;  that  was  quite  an  exercise  in  manipulation  of  sums,  floors,  and 
ceilings.  Readers  who  are  not  accustomed  to  “proving  that  errors  are  small’’ 
might  find  it  hard  to  believe  that  anybody  would  have  the  courage  to  keep 
going,  when  faced  with  such  weird-looking  sums.  But  actually,  a second  look 
shows  that  there’s  a simple  motivating  thread  running  through  the  whole 
calculation.  The  main  idea  is  that  a certain  sum  s(ct,  U,v)  of  n terms  can  be 
reduced  to  a similar  sum  of  at  most  an  terms.  Everything  else  cancels  out 
except  for  a small  residual  left  over  from  terms  near  the  boundaries. 

Let’s  take  a deep  breath  now  and  do  one  more  sum,  which  is  not  trivial 
but  has  the  great  advantage  (compared  with  what  we’ve  just  been  doing)  that 
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it  comes  out  in  closed  form  so  that  we  can  easily  check  the  answer.  Our  goal 
now  will  be  to  generalize  the  sum  in  (3.26)  by  finding  an  expression  for 


L 

05?k<m  ' 


nk  + X 
TTL 


integer  m > 0,  integer  n. 


Is  this  a harder  sum 
of  floors,  or  a sum 
of  harder  floors? 


Finding  a closed  form  for  this  sum  is  tougher  than  what  we’ve  done  so  far 
(except  perhaps  for  the  discrepancy  problem  we  just  looked  at).  But  it’s 
instructive,  so  we’ll  hack  away  at  it  for  the  rest  of  this  chapter. 

As  usual,  especially  with  tough  problems,  we  start  by  looking  at  small 
cases.  The  special  case  n = 1 is  (3.26),  with  x replaced  by  x/m: 


1 + X 

y + 

m 

+ • 

• + 

W • 


And  as  in  Chapter  1,  we  find  it  useful  to  get  more  data  by  generalizing 
downwards  to  the  case  n = 0: 


X 

+ 

X 

+ ••■  + 

X 

= m 

X 

- TT1- 

-m- 

LmJ 

im. 

Our  problem  has  two  parameters,  m and  n;  let’s  look  at  some  small  cases 
for  m.  When  m = 1 there’s  just  a single  term  in  the  sum  and  its  value  is  [xj. 
When  m = 2 the  sum  is  \x/2 J + [(x  + n)/2J.  We  can  remove  the  interaction 
between  x and  n by  removing  n from  inside  the  floor  function,  but  to  do  that 
we  must  consider  even  and  odd  n separately.  If  n is  even,  n/2  is  an  integer, 
so  we  can  remove  it  from  the  floor: 


x / 

X 

u\  „ > 

n 

— 

+ - =2  - 

4-  — 

12  J V 

121 

2)  11 

J 2 

If  n is  odd,  (n  1)/2  is  an  integer  so  we  get 


x 

.2. 


+ 


x T 1 
2 


+ 


n — 1 


Lxj  + 


n-  I 
2 


The  last  step  follows  from  (3.26)  with  m = 2. 

These  formulas  for  even  and  odd  n slightly  resemble  those  for  n = 0 and  1 , 
but  no  clear  pattern  has  emerged  yet;  so  we  had  better  continue  exploring 
some  more  small  cases.  For  m = 3 the  sum  is 


Be  forewarned:  This 
is  the  beginning  of 
a pattern,  in  that 
the  last  part  of  the 
chapter  consists 
of  the  solution  of 
some  long,  difficult 
problem,  with  little 
more  motivation 
than  curiosity. 

-Students 

Touche.  But  c’mon, 
gang,  do  you  always 
need  to  be  to/d 
about  applications 
before  you  can  get 
interested  in  some- 
thing? This  sum 
arises,  for  example, 
in  the  study  of 
random  number 
generation  and 
testing.  But  math- 
ematicians looked 
at  it  long  before 
computers  came 
along,  because  they 
found  it  natural  to 
ask  if  there’s  a way 
to  sum  arithmetic 
progressions  that 
have  been  ‘‘floored.” 
-Your  instructor 


X 

x + n 

x + 2n 

— 

+ 

+ 

13  J 

3 

3 

and  we  consider  three  cases  for  n:  Either  it’s  a multiple  of  3,  or  it’s  1 more 
than  a multiple,  or  it’s  2 more.  That  is,  n mod  3 = 0,  1 , or  2.  If  n mod  3=0 
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“Inventive  genius 
requires  pleasurable 
mental  activity  as 
a condition  for  its 
vigorous  exercise. 
‘Necessity  is  the 
mother  of  invention' 
is  a silly  proverb. 
‘Necessity  is  the 
mother  of  futile 
dodges’  is  much 
nearer  to  the  truth. 
The  basis  of  the 
growth  of  modern 
invention  is  science, 
and  science  is  al- 
most wholly  the 
outgrowth  of  plea- 
surable intellectual 
curiosity.” 

-A.  N.  White- 
head  [303] 


then  n/3  and  2n/3  are  integers,  so  the  sum  is 

IIMLsJ+HIsJ+tHIIH- 

If  n mod  3 = 1 then  (n  — 1 )/3  and  (2n  — 2) /3  are  integers,  so  we  have 


Again  this  last  step  follows  from  (3.26),  this  time  with  m = 3.  And  finally,  if 
n mod  3=2  then 


The  left  hemispheres  of  our  brains  have  finished  the  case  m = 3,  but  the 
right  hemispheres  still  can’t  recognize  the  pattern,  so  we  proceed  to  m = 4: 


At  least  we  know  enough  by  now  to  consider  cases  based  on  n mod  m.  If 
n mod  4 = 0 then 


And  if  n mod  4 = 1, 


The  case  n mod  4 = 3 turns  out  to  give  the  same  answer.  Finally,  in  the  case 
n mod  4 = 2 we  get  something  a bit  different,  and  this  turns  out  to  be  an 
important  clue  to  the  behavior  in  general: 


This  last  step  simplifies  something  of  the  form  [y/2J  + |_(y  + 1)/2J,  which 
again  is  a special  case  of  (3.26). 


92  INTEGER  FUNCTIONS 


To  summarize,  here’s  the  value  of  our  sum  for  small  m: 


ml  n m o d m = 0 n mod  m = 1 n mod  m = 2 n mod  m = 3 


1 [xj 


2 

3 

4 


2 

2 


[xj  + n - 1 |x_l  + n ™ 1 


3 

2 


3 

2 


It  looks  as  if  we’re  getting  something  of  the  form 


a 


x 

-a- 


+ bn  + c , 


where  a,  b,  and  c somehow  depend  on  m and  n.  Even  the  myopic  among 
us  can  see  that  b is  probably  (m  — 1)/2.  It’s  harder  to  discern  an  expression 
for  a;  but  the  case  n mod  4 = 2 gives  us  a hint  that  a is  probably  gcd(m,  n), 
the  greatest  common  divisor  of  m and  n.  This  makes  sense  because  gcd(m,  n) 
is  the  factor  we  remove  from  m and  n when  reducing  the  fraction  n/m  to 
lowest  terms,  and  our  sum  involves  the  fraction  n/m.  (We’ll  look  carefully 
at  gcd  operations  in  Chapter  4.)  The  value  of  c seems  more  mysterious,  but 
perhaps  it  will  drop  out  of  our  proofs  for  a and  b. 

In  computing  the  sum  for  small  m,  we’ve  effectively  rewritten  each  term 
of  the  sum  as 


x + kn 



x + kn  mod  m 

kn 
4 

kn  mod  m 

m 

m 

m 

m 

because  (kn  — kn  mod  m)/m  is  an  integer  that  can  be  removed  from  inside 
the  floor  brackets.  Thus  the  original  sum  can  be  expanded  into  the  following 
tableau: 


+ 

+ 


x 

.m. 

x + n mod  m 
m 

x + 2n  mod  m 
m 


0 

+ — 
m 

n 

+ — 
m 

2n 

+ — 

m 


Omodm 

m 

nmodm 

m 

2n  mod  m 
m 


+ 


x+  (m—  l)nmod  m 


+ (m-l)n  (m-l)nmodm 


m 


m 


m 


3.5  FLOOR/CEILING  SUM  S 93 


Lemmanow, 

dilemma  later, 


When  we  experimented  with  small  values  of  m,  these  three  columns  led  re- 
spectively to  a|_x/aj , bn,  and  c. 

In  particular,  we  can  see  how  b arises.  The  second  column  is  an  arithmetic 
progression,  whose  sum  we  know-it’s  the  average  of  the  first  and  last  terms, 
times  the  number  of  terms: 

(m-  1)rt\  (m-1)u 

m J 'm  “ 2 

So  our  guess  that  b = (m  — 1 )/2  has  been  verified. 

The  first  and  third  columns  seem  tougher;  to  determine  a and  c we  must 
take  a closer  look  at  the  sequence  ofnumbers 

Omodm,  nmodm,  2n  mod  m,  ....  (m-l)nmodm. 


Suppose,  for  example,  that  m = 12  and  rt  = 5.  If  we  think  of  the 
sequence  as  times  on  a clock,  the  numbers  are  0 o’clock  (we  take  12  o’clock 
to  be  0 o’clock),  then  5 o’clock,  10  o’clock,  3 o’clock  (=  15  o’clock),  8 o’clock, 
and  so  on.  It  turns  out  that  we  hit  every  hour  exactly  once. 

Now  suppose  m = 12  and  n = 8.  The  numbers  are  0 o’clock,  8 o’clock, 
4 o’clock  (=  16  o’clock),  but  then  0,  8,  and  4 repeat.  Since  both  8 and  12  are 
multiples  of  4,  and  since  the  numbers  start  at  0 (also  a multiple  of  4),  there’s 
no  way  to  break  out  of  this  pattern-they  must  all  be  multiples  of  4. 

In  these  two  cases  we  have  gcd(  12,5)  = 1 and  gcd(  12,8)  = 4.  The  general 
rule,  which  we  will  prove  next  chapter,  states  that  if  d = gcd(m,n)  then  we 
get  the  numbers  0,  d,  2d, . . . , m — d in  some  order,  followed  by  d — 1 more 
copies  of  the  same  sequence.  For  example,  with  m = 12  and  n = 8 the  pattern 
0,  8,  4 occurs  four  times. 

The  first  column  of  our  sum  now  makes  complete  sense.  It  contains 
d copies  of  the  terms  |_x/mj,  |_(x  + d)/mj,  . . . , |_(x  + m — d)/mj,  in  some 
order,  so  its  sum  is 


x 

mJ 


= d 


= d 


x + d 
m 
x/d 


m/d 


x 

-d. 


+ 


x + m — d 


m 


+ 


x/d  + 1 
m/d 


+ •••  + 


x/d  + m/d  — 1 


m/d 


This  last  step  is  yet  another  application  of  (3.26).  Our  guess  for  a has  been 
verified: 


a = d = gcd(m,  n) 
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Also,  as  we  guessed,  we  can  now  compute  c,  because  the  third  column 
has  become  easy  to  fathom.  It  contains  d copies  of  the  arithmetic  progression 
O/m,  d/m,  2d/m,  . , (m  — d)/m,  so  its  sum  is 


m—  d 

2 


the  third  column  is  actually  subtracted,  not  added,  so  we  have 


d - m 


End  of  mystery,  end  of  quest.  The  desired  closed  form  is 


nk  + x 


m—  1 d - m 


where  d = gcd(m,  n).  As  a check,  we  can  make  sure  this  works  in  the  special 
cases  n = 0 and  rt  = 1 that  we  knew  before:  When  n = 0 we  get  d = 
gcd(m,0)  = m;  the  last  two  terms  of  the  formula  are  zero  so  the  formula 
properly  gives  m[x/mj . And  for  n = 1 we  get  d = gcd(m,  1)  = 1;  the  last 
two  terms  cancel  nicely,  and  the  sum  is  just  |xj • 

By  manipulating  the  closed  form  a bit,  we  can  actually  make  it  symmetric 
in  m and  n: 


nk  + x ,ixi 

~tr\  = d ldJ 

= "lsJ 
= dliJ 


m — 1 d — m 

+ — n+  — 

(m-l)(n-l)  m-1  d-m 

+ j +— 2~  +~2 


(m-  l)(n-  1)  d-  1 
2 + 2 


(3-32) 


This  is  astonishing,  because  there’s  no  reason  to  suspect  that  such  a sum  Yup,  I'm  floored. 
should  be  symmetrical.  We  have  proved  a “reciprocity  law,’ 


nk  + x 


mk  + x 


integers  m,  n > 0. 


For  example,  if  m = 41  and  n = 127,  the  left  sum  has  41  terms  and  the  right 
has  127;  but  they  still  come  out  equal,  for  all  real  x. 
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Exercises 

Warmups 

1 When  we  analyzed  the  Josephus  problem  in  Chapter  1,  we  represented 
an  arbitrary  positive  integer  n in  the  form  n = 2m  + l,  where  0 jC  l < 2”. 
Give  explicit  formulas  for  l and  m as  functions  of  n,  using  floor  and/or 
ceiling  brackets. 

2 What  is  a formula  for  the  nearest  integer  to  a given  real  number  x?  In  case 
of  ties,  when  x is  exactly  halfway  between  two  integers,  give  an  expression 
that  rounds  (a)  up-that  is,  to  [Y|;  (b)  down-that  is,  to  [xj- 

3 Evaluate  |_  [ttlcxJ  rt/ccj  ,w  hen  m and  n are  positive  integers  and  a is  an 
irrational  number  greater  than  n. 

4 The  text  describes  problems  at  levels  1 through  5.  What  is  a level  0 
problem?  (This,  by  the  way,  is  not  a level  0 problem.) 

5 Find  a necessary  and  sufficient  condition  that  [txxJ  = n[xj  , when  n is  a 
positive  integer.  (Your  condition  should  involve  {x}.) 

6 Can  something  interesting  be  said  about  [f (x)J  when  f(x)  is  a continuous, 
monotonically  decreasing  function  that  takes  integer  values  only  when 
x is  an  integer? 

7 Solve  the  recurrence 


Xn  = n , for  0 ^ n < m; 

Xn  = Xn_m  + 1 , for  n m. 


You  know  you're  8 
in  college  when  the 
book  doesn’t  tell 
you  how  to  pro- 
nounce ‘D  irichlet’. 


Prove  the  D irichlet  box  principle:  If  n objects  are  put  into  m boxes, 
some  box  must  contain  ["n./m]  objects,  and  some  box  must  contain 

6 |n/™J' 

Egyptian  mathematicians  in  1800  B.c.  represented  rational  numbers  be- 
tween 0 and  1 as  sums  of  unit  fractions  1 /xi  + . . . + 1 /Xk,  where  the  x’s 
were  distinct  positive  integers.  For  example,  they  wrote  j + pg  instead 
of  |.  Prove  that  it  is  always  possible  to  do  this  in  a systematic  way:  If 

0 < m/n  < 1 , then 


m 

-B- 


1 J m 

4_  + l representation  of  n. 


(This  is  Fibonacci's  algorithm,  due  to  Leonardo  Fibonacci,  a.d. 


1202.) 
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Basics 

10  Show  that  the  expression 

’2x  + l"|  |~2x+1"|  2x+l 

2 ~ 4 + L 4 

is  always  either  |xj  or  [x] . In  what  circumstances  does  each  case  arise? 

11  Give  details  of  the  proof  alluded  to  in  the  text,  that  the  open  interval 
(a..  (3)  contains  exactly  [|3]  — [a]  — 1 integers  when  a < (3.  Why  does 
the  case  a = |3  have  to  be  excluded  in  order  to  make  the  proof  correct? 

12  Prove  that 


for  all  integers  n and  all  positive  integers  m.  [This  identity  gives  us 
another  way  to  convert  ceilings  to  floors  and  vice  versa,  instead  of  using 
the  reflective  law  (3.4).] 

13  Let  a and  (3  be  positive  real  numbers.  Prove  that  Spec(oc)  and  Spec(  (3) 
partition  the  positive  integers  if  and  only  if  a and  (3  are  irrational  and 

l/a+  1/(3  = 1. 

14  Prove  or  disprove: 

(xmodny)mody  = xmody,  integer  n. 

15  Is  there  an  identity  analogous  to  (3.26)  that  uses  ceilings  instead  of  floors? 

16  Prove  that  TV  mod  2 = (1  — (-1)“)  /2.  Find  and  prove  a similar  expression 
for  n mod  3 in  the  form  a + bcun  + ctu2u,  where  cu  is  the  complex  number 
( ■ 1 + i\/3  )/2.  Hint:  cu3  = 1 and  1 + cu  + cu2  = 0. 

17  Evaluate  the  sum  Xle<k<m  [_x  + k/mj  in  the  case  x )>  0 by  substituting 

/Lj  ^ x + k/m]  for  [x  + k/m]  and  summing  first  on  k.  Does  your 
answer  agree  with  (3.26)? 

18  Prove  that  the  boundary- value  error  term  S in  (3.30)  is  at  most  tx  1 v. 
Hint:  Show  that  small  values  of  j are  not  involved. 

Homework  exercises 

19  Find  a necessary  and  sufficient  condition  on  the  real  number  b > 1 such 
that 

LlogbxJ  = [logbLxJJ 


for  all  real  x > 1 . 
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20  Find  the  sum  of  all  multiples  of  x in  the  closed  interval  [a..  |3],  when 

x > 0. 

21  How  many  of  the  numbers  2m,  for  0 m M,  have  leading  digit  1 in 

decimal  notation? 

22  Evaluate  the  sums  Sn  = [n/2k  + and  Tn  = Y.k^]  2k  [n/2k  + iJ 2- 

23  Show  that  the  nth  element  of  the  sequence 

1,2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5,... 

is  [x/2tr  + (The  sequence  contains  exactly  m occurrences  of  m.) 

24  Exercise  13  establishes  an  interesting  relation  between  the  two  multisets 
Spec  (a)  and  Spec(a/(a—  1)),  when  a is  any  irrational  number  > 1, 
because  1 /a  + ( a — 1 )/a  = 1.  Find  (and  provej  an  interesting  relation 
between  the  two  multisets  Spec  (a)  and  Spec(ct/(a+  1)),  when  a is  any 
positive  real  number. 

25  Prove  or  disprove  that  the  Knuth  numbers,  defined  by  (3.16),  satisfy 

Kn  ^ n for  all  nonnegative  n. 

26  Show  that  the  auxiliary  Josephus  numbers  (3.20)  satisfy 

(jtt)  « D» < (^7)  ■ for”?°- 

27  Prove  that  infinitely  many  of  the  numbers  D[3  defined  by  (3.20)  are 

even,  and  that  infinitely  many  are  odd. 

28  Solve  the  recurrence 

Clo  = l; 

an  = an-i  + L\/an-iJ>  for  n > 0. 

29  Show  that,  in  addition  to  (3.31),  we  have 

D(a,n)  ^ D(a',  [anj)  a'1  -2. 

30  Show  that  the  recurrence 

X0  = m, 

Xu  = Xu_!  - 2,  for  n>0, 

has  the  solution  Xn  = [a2  ],  if  m is  an  integer  greater  than  2,  where 
a -)-  or1  = m and  a > 1 . For  example,  if  m = 3 the  solution  is 

Xn  = If  4>  = — 2 — ’ a = cj)2 . 
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31  Prove  or  disprove:  |xj  + [yj  + [x  + yj  ^ [2xJ  + [2yJ  . 

32  Let  ||x||  = min(x  — |_xj,  [x~|  — x)  denote  the  distance  from  x to  the  nearest 
integer.  What  is  the  value  of 

Y_lk ||x/2k||2  ? 

k 

(Note  that  this  sum  can  be  doubly  infinite.  For  example,  when  x = 1/3 
the  terms  are  nonaero  as  k — > — oo  and  also  as  k — ) -foo.) 

Exam  problems 

33  A circle,  2n  — 1 units  in  diameter,  has  been  drawn  symmetrically  on  a 

2 n x 2n  chessboard,  illustrated  here  for  n = 3: 


a How  many  cells  of  the  board  contain  a segment  of  the  circle? 

b Find  a function  f(n,  k)  such  that  exactly  Y. k = i f (tl,  k)  cells  of  the 

board  lie  entirely  within  the  circle. 

34  Let  f(n)  = ££=1  [lgk|. 

a Find  a closed  form  for  f(n)  , when  n ;>  1. 
b Prove  that  f(n)  = n - 1 + f(pn/2])  + f(|_n/2_|)  for  all  n ^ 1. 
Simplify  the  formula  [(n  + 1 )2rt!  ej  mod  n. 

Assuming  that  n is  a nonnegative  integer,  find  a closed  form  for  the  sum 


35 

36 


I 


l<k<22" 

37  Prove  the  identity 


21*8  kJ  4 L's '«  kJ 


£(l 


0$k<m 


| m + k 

-1- 

f)  = 

m2 

L n J 

Ln 

n 

min(m  mod  n,  (— m)  mod  n)J 
n 


Simplify  it,  but 
don’t  change  the 
value. 


for  all  positive  integers  m and  rt 
3 8 Let  xi,  Xu  be  real  numbers  such  that  the  identity 

Tl 

y Lmxkj  = m y xk 

k=l  ^ l$k$n 

holds  for  all  positive  integers  m.  Prove  something  interesting  about 
x i , • • • ) x,. 
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39  Prove  that  the  double  sum  ^o$k^logbx  Hocjcbl^*  + jbk)/bk+1]  equals 

(b-  l)([logbxJ  + 1)  + px"|  — 1,  for  every  real  number  x ^ I and  every 
integer  b > 1 . 

40  The  spiral  function  o(n),  indicated  in  the  diagram  below,  maps  a non- 

negative integer  n onto  an  ordered  pair  of  integers  (x(n),  y (n)).  For 
example,  it  maps  n = 9 onto  the  ordered  pair  (1,2). 


a Prove  that  if  m = [x/ruj , 

x(n)  = (-l)m((Ti  - m(m+ 1 ))  • [L2v/nj  is  even]  + , 

and  find  a similar  formula  for  y(n).  Hint:  Classify  the  spiral  into 
segments  Wk,  Sk,  Ek,  Nk  according  as  \2\fn.\  = 4k  — 2,  4k  — 1, 4k, 

4k+l. 

b Prove  that,  conversely,  we  can  determine  n from  o(n)  by  a formula 
of  the  form 

n = (2k)2  ± (2k  + x(n)  +y(n))  , k = max(|x(n)|,|y(n)|). 

Give  a rule  for  when  the  sign  is  + and  when  the  sign  is  — . 

Bonus  problems 

41  Let  f and  g be  increasing  functions  such  that  the  sets  {f  (l),f(2),  ...  ) and 
{g  (1)  , g (2), . . } partition  the  positive  integers.  Suppose  that  f and  g are 
related  by  the  condition  g(n)  = f(f(n))  -|-  1 for  all  n > 0.  Prove  that 
f(n)  = |n4>J  and  g(n)  = |ruj)2J,  where  4>  = (1  + \/5)/2. 

42  Do  there  exist  real  numbers  a,  (3,  and  y such  that  Spec(cx),  Spec(  (3),  and 
Spec(y)  together  partition  the  set  of  positive  integers? 
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43  Find  an  interesting  interpretation  of  the  Knuth  numbers,  by  unfolding 

the  recurrence  (3.16). 

44  Show  that  there  are  integers  a*  1 and  dnq ' such  that 


4q)  = 


Dn'  + dn 

q-1 


(q) 


D 


q)  + 


d,q) 

UU 


for  n > 0, 


when  D,)1 1 is  the  solution  to  (3.20).  Use  this  fact  to  obtain  another  form 
of  the  solution  to  the  generalized  Josephus  problem: 

jq  (n)  = 1 + d!q  1 + q(n  - Q^)  , for  a[ql  ^ n < aj^, . 

45  Extend  the  trick  of  exercise  30  to  find  a closed-form  solution  to 


Yo  = m, 

Yn  = 2Y;_!  - 1 , for  n > 0, 
if  m is  a positive  integer. 

46  Prove  that  if  n = [(  V2.1  + , where  m and  l are  nonnegative 

integers,  then  [^/2n(n  + 1 )J  = j_ (x/2 l+ 1 + \/2l)mJ  • Use  this  remarkable 
property  to  find  a closed  form  solution  to  the  recurrence 

Lo  = a,  integer  a > 0; 

Ln  = L ^Ln-itLn-iTTTj  , for  n > 0. 

Hint:  [v/2n(n+  1)J  = [\/2(n  + i)J. 

47  The  function  f(x)  is  said  to  be  replicative  if  it  satisfies 

f(mx)  = f(x)  +f(x+  — ) H hffx+  — l') 

m/  \ m / 

for  every  positive  integer  m.  Find  necessary  and  sufficient  conditions  on 
the  real  number  c for  the  following  functions  to  be  replicative: 

a f(x)  = x + c. 
b f(x)  = [x  + c is  an  integer], 
c f(x)  =max(|xj,c). 
d f(x)  = x + c [xj  - 1 [x  is  not  an  integer]. 

48  Find  a necessary  and  sufficient  condition  on  the  real  numbers  0 <;  ct  < 1 
and  3 3 0 such  that  we  can  determine  oc  and  3 from  the  infinite  multiset 
of  values 


{ [na\  + |n|3j  | n > 0 } , 
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Research  problems 

49  Find  a necessary  and  sufficient  condition  on  the  nonnegative  real  numbers 
a and  (3  such  that  we  can  determine  a and  |3  from  the  infinite  multiset 
of  values 

{ UnaJ0|  | n.  > 0}  . 

50  Let  x be  a real  number  ?>  ({)  — 1 (1  + 1/5).  The  solution  to  the  recurrence 

Zo(x)  = x, 

Zn(x)  = Zn_i  (x)2  — 1 , for  n > 0, 
can  be  written  Z,(x)  = [f(x]2  ],  if  x is  an  integer,  where 
f (x)  = lim  Zt.  (x)  , 

n—>o o 

because  Z,(x)  — 1 <1  (x)2"  < Z,(x).  What  interesting  properties  does 
this  function  f ( x ) have? 

51  Given  nonnegative  real  numbers  a and  (3,  let 

Spec  (a;  |3)  = { La+ (3J,  [2a+ |3J,  [3a+ (3J, . . . } 

be  a multiset  that  generalizes  Spec  (a)  = Spec  (a;  0).  Prove  or  disprove: 
If  the  m ^ 3 multisets  Specfai;  |3i),  Spec(a2i  $2),  . . . , Spec(am;  (3m) 
partition  the  positive  integers,  and  if  the  parameters  <xi  < a.2  < < ■ ■ < a,„ 
are  rational,  then 

2m  - 1 

<*k  = 2k- 1 ' for  1 ^ k ^ m. 

52  Fibonacci’s  algorithm  (exercise  9)  is  “greedy”  in  the  sense  that  it  chooses 
the  least  conceivable  q at  every  step.  A more  complicated  algorithm  is 
known  by  which  every  fraction  m/n  with  n odd  can  be  represented  as  a 
sum  of  distinct  unit  fractions  1 /qi  + . . . + 1 /qk  with  odd  denominators. 
Does  the  greedy  algorithm  for  such  a representation  always  terminate? 


Number  Theory 


INTEGERS  ARE  CENTRAL  to  the  discrete  mathematics  we  are  emphasiz- 
ing in  this  book.  Therefore  we  want  to  explore  the  theory  of  numbers,  an 
important  branch  of  mathematics  concerned  with  the  properties  of  integers. 

We  tested  the  number  theory  waters  in  the  previous  chapter,  by  intro- 
ducing binary  operations  called  ‘mod’  and  ‘gcd’.  Now  let’s  plunge  in  and 
really  immerse  ourselves  in  the  subject. 

4.1  DIVISIBILITY 

We  say  that  m divides  n (or  n is  divisible  by  m)  if  m > 0 and  the 
ratio  n/m  is  an  integer.  This  property  underlies  all  of  number  theory,  so  it’s 
convenient  to  have  a special  notation  for  it.  We  therefore  write 

m\n  <(=)>  m > 0 and  n = mk  for  some  integer  k.  (4.1) 

(The  notation  ‘m|n’  is  actually  much  more  common  than  ‘m\n’  in  current 
mathematics  literature.  But  vertical  lines  are  overused-for  absolute  val- 
ues, set  delimiters,  conditional  probabilities,  etc.  -and  backward  slashes  are 
underused.  Moreover,  ‘m\n’  gives  an  impression  that  m is  the  denominator  of 
an  implied  ratio.  So  we  shall  boldly  let  our  divisibility  symbol  lean  leftward.) 

If  m does  not  divide  n we  write  ‘m\n’. 

There’s  a similar  relation,  “n  is  a multiple  of  m,”  which  means  almost 
the  same  thing  except  that  m doesn’t  have  to  be  positive.  In  this  case  we 
simply  mean  that  n = mk  for  some  integer  k.  Thus,  for  example,  there’s  only 
one  multiple  of  0 (namely  0),  but  nothing  is  divisible  by  0.  Every  integer  is 
a multiple  of  -1,  but  no  integer  is  divisible  by  -1  (strictly  speaking).  These 
definitions  apply  when  m and  n are  any  real  numbers;  for  example,  271  is 
divisible  by  71,  But  we’ll  almost  always  be  using  them  when  m and  rt  are 
integers.  After  all,  this  is  number  theory. 


In  other  words,  be 
prepared  to  drown. 


“. . . no  integer  is 
divisible  by  -1 
(strictly  speaking).” 
-Graham,  Knuth, 
and  Patashnik  [131] 
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In  Britain  we  call 
this  ‘hcf’  (highest 
common  factor). 


Not  to  be  confused 
with  the  greatest 
common  multiple. 


(Remember  that 
nT  or  iT  can  be 
negative.) 


The  greatest  common  divisor  of  two  integers  m and  n is  the  largest 
integer  that  divides  them  both: 

gcd(m.n)  = max(  k k\m  and  k\ri } . (4.2) 

For  example,  gcd(  12,18]  = 6.  This  is  a familiar  notion,  because  it’s  the 
common  factor  that  fourth  graders  learn  to  take  out  of  a fraction  m/n  when 
reducing  it  to  lowest  terms:  12/18=  (12/6)/(  1 8/6)=  2/3.  Notice  that  if 
n > 0 we  have  gcd(0,  n)  = n,  because  any  positive  number  divides  0,  and 
because  n is  the  largest  divisor  of  itself.  The  value  of  gcd(0,0)  is  undefined. 
Another  familiar  notion  is  the  least  common  multiple, 

lcm(m.n)  = min{k  k > 0,  m \ k and  n\k};  (4.3) 

this  is  undefined  if  m ^ 0 or  n ^ 0.  Students  of  arithmetic  recognize  this 
as  the  least  common  denominator,  which  is  used  when  adding  fractions  with 
denominators  m and  n.  For  example,  lcm(  12,18)  = 36,  and  fourth  graders 
know  that  -^  + yg  = ^ = j|.  The  1cm  is  somewhat  analogous  to  the 

gcd,  but  we  don’t  give  it  equal  time  because  the  gcd  has  nicer  properties. 

One  of  the  nicest  properties  of  the  gcd  is  that  it  is  easy  to  compute,  using 
a 2300-year-old  method  called  Euclid’s  algorithm.  To  calculate  gcd(m,  n), 
for  given  values  0 <C  m < n,  Euclid’s  algorithm  uses  the  recurrence 

gcd(0,n)  = n ; 

gcd(m,n)  = gcd(n  mod  m,  m)  , for  m > 0.  (4.4) 

Thus,  for  example,  gcd(  12,18)=  gcd(6,12)  = gcd(0,6)  = 6.  The  stated 
recurrence  is  valid,  because  any  common  divisor  of  m and  n must  also  be  a 
common  divisor  of  both  m and  the  number  n mod  m,  which  is  n — [rt/mj  m 
There  doesn’t  seem  to  be  any  recurrence  for  lcm(m,n)  that’s  anywhere  near 
as  simple  as  this.  (See  exercise  2.) 

Euclid’s  algorithm  also  gives  us  more:  We  can  extend  it  so  that  it  will 
compute  integers  m’  and  n’  satisfying 

m’m  + n’n  = gcd(m,  n)  . (4-5) 

Flere’s  how.  If  m = 0,  we  simply  take  nT  = 0 and  n’  = 1.  Otherwise  we 
let  T = n mod  m and  apply  the  method  recursively  with  r and  m in  place  of 
m and  n,  computing  f and  m such  that 

rr  + mm  = gcd(r,  m)  . 

Since  r = n — [n/mjm  and  gcd(r,  m)  = gcd(m,n),  this  equation  tells  us  that 
r(n-  [n/mjm)  +mm  = gcd(m,n). 
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The  left  side  can  be  rewritten  to  show  its  dependency  on  m and  n: 

(m  - |n/mJ  f)m+  rn  = gcd(m,  n)  ; 

hence  m’  = m — [n/mjf  and  n’  = f are  the  integers  we  need  in  (4.5).  For 
example,  in  our  favorite  case  m = 12,  n = 18,  this  method  gives  6 = 0-0+ T6  = 
1-6  + 0-12  = (-1)-12  + M8. 

But  why  is  (4.5)  such  a neat  result?  The  main  reason  is  that  there’s  a 
sense  in  which  the  numbers  m’  and  n’  actually  prove  that  Euclid’s  algorithm 
has  produced  the  correct  answer  in  any  particular  case.  Let’s  suppose  that 
our  computer  has  told  us  after  a lengthy  calculation  that  gcd(m,  n)  = d and 
that  m’m  + n’n  = d;  but  we’re  skeptical  and  think  that  there’s  really  a 
greater  common  divisor,  which  the  machine  has  somehow  overlooked.  This 
cannot  be,  however,  because  any  common  divisor  of  m and  n has  to  divide 
m’m  + n’n;  so  it  has  to  divide  d;  so  it  has  to  be  ^ d.  Furthermore  we  can 
easily  check  that  d does  divide  both  m and  n.  (Algorithms  that  output  their 
own  proofs  of  correctness  are  called  self- certifying.) 

We’ll  be  using  (4.5)  a lot  in  the  rest  of  this  chapter.  One  of  its  important 
consequences  is  the  following  mini-theorem: 

k\m  and  k\n  4=4-  k\gcd(m,n).  (4.6) 

(Proof:  If  k divides  both  m and  n,  it  divides  m’m  + n’n,  so  it  divides 
gcd(  m,  n)  . Conversely,  if  k divides  gcd(  m,  n),  it  divides  a divisor  of  m and  a 
divisor  of  n,  so  it  divides  both  m and  n.)  We  always  knew  that  any  common 
divisor  of  m and  n must  be  less  than  or  equal  to  their  gcd;  that’s  the 
definition  of  greatest  common  divisor.  But  now  we  know  that  any  common 
divisor  is,  in  fact,  a divisor  of  their  gcd. 

Sometimes  we  need  to  do  sums  over  all  divisors  of  n.  In  this  case  it’s 
often  useful  to  use  the  handy  rule 

Ha,  . £<V-.  integer  n > 0,  (4-7) 

m\n  m\n 

which  holds  since  n/m  runs  through  all  divisors  of  n when  m does.  For 

example,  when  n = 1 2 this  says  that  cq  + G2  + <13  + Q4  + ag  + Q12  = Q12  + 

Clg  — F a 4 + a 3 + a2  + al. 

There’s  also  a slightly  more  general  identity, 

= Y y cim[n  = mk] , (4.8) 

m\n  k m>0 

which  is  an  immediate  consequence  of  the  definition  (4.1).  If  n is  positive,  the 
right-hand  side  of  (4.8)  is  Cln/ki  hence  (4.8)  implies  (4.7).  And  equation 
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(4.8)  works  also  when  n is  negative.  (In  such  cases,  the  nonzero  terms  on  the 
right  occur  when  k is  the  negative  of  a divisor  of  n.) 

Moreover,  a double  sum  over  divisors  can  be  “interchanged”  by  the  law 

^ ~m  Y-  Qk,m  — Y-  Y.  Gk.kl 

m\n  k\m  k\n  l\(n/k] 

For  example,  this  law  takes  the  following  form  when  n = 12: 

aU+  ( a I . 2 +02,2)  +(<11,3+  <l3,3) 

+ (QM+  02,4+  04,4)+  (Oi,6  + a2, 6+  03,6  + 06,6) 

+ ( 0 1 ,12  + 02,1 2 + O3J2  + 04,12  + 06,12  + 012,12) 

— ( 0 1 , 1 + Ql  ,2  + Ol  ,3  -)-  a I . 4 + a!  , 6 + 0.132) 

+ + a2,4  + 02,6  + 02,12)  + (03,3  + 03^  + 0332) 

+ ( a4,4  + 04, 12)  + ( 06,6  + 06,12:1  + 012,12  ■ 

We  can  prove  (4.9)  with  Iversonian  manipulation.  The  left-hand  side  is 

Y Y ok,m[n  = im][m  = kl] 

j,l  k,m>0 

the  right-hand  side  is 

Y Qk.kiltt  = jk]  [n/k  = ml]  = YY  Qk,ki)n  = mlk]  , 

i.m  k,l>0  m k,l>0 

which  is  the  same  except  for  renaming  the  indices.  This  example  indicates 
that  the  techniques  we’ve  learned  in  Chapter  2 will  come  in  handy  as  we  study 
number  theory. 

4.2  PRIMES 


How  about  the  p in 
'explicitly'? 

2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  ., 

Some  numbers  look  prime  but  aren’t,  like  91  (=  7.13)  and  161  (=  7.23).  These 
numbers  and  others  that  have  three  or  more  divisors  are  called  composite. 
Every  integer  greater  than  1 is  either  prime  or  composite,  but  not  both. 

Primes  are  of  great  importance,  because  they’re  the  fundamental  building 
blocks  of  all  the  positive  integers.  Any  positive  integer  n can  be  written  as  a 


A positive  integer  p is  called  pnme  if  it  has  just  two  divisors,  namely 
1 and  p.  Throughout  the  rest  of  this  chapter,  the  letter  p will  always  stand 
for  a prime  number,  even  when  we  don’t  say  so  explicitly.  By  convention, 
1 isn’t  prime,  so  the  sequence  of  primes  starts  out  like  this: 


(4-9) 
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product  of  primes, 

m 

Tl=  Pi  • • -Pm  = J~J  Pk  , PI  6 6 Pm.  (4.10) 

k=l 

For  example,  12  = 2-2-3;  11011  =7-11-11-13;  11111  =41-271.  (Products 
denoted  by  [~[  are  analogous  to  sums  denoted  by  ]T,  as  explained  in  exer- 
cise 2.25.  If  m = 0,  we  consider  this  to  be  an  empty  product,  whose  value 
is  1 by  definition;  that’s  the  way  n = 1 gets  represented  by  (4.10).)  Such  a 
factorization  is  always  possible  because  if  n > 1 is  not  prime  it  has  a divisor 
ni  such  that  1 < m < n;  thus  we  can  write  n = m -n2,  and  (by  induction) 
we  know  that  ri]  and  TI2  can  be  written  as  products  of  primes. 

Moreover,  the  expansion  in  (4.1 0)  is  unique:  There’s  only  one  way  to 
write  n as  a product  of  primes  in  nondecreasing  order.  This  statement  is 
called  the  Fundamental  Theorem  of  Arithmetic,  and  it  seems  so  obvious  that 
we  might  wonder  why  it  needs  to  be  proved.  How  could  there  be  two  different 
sets  of  primes  with  the  same  product?  Well,  there  can’t,  but  the  reason  isn’t 
simply  “by  definition  of  prime  numbers ! ’ For  example,  if  we  consider  the  set 
of  all  real  numbers  of  the  form  m + nx/lO  when  m and  n are  integers,  the 
product  of  any  two  such  numbers  is  again  of  the  same  form,  and  we  can  call 
such  a number  “prime”  if  it  can’t  be  factored  in  a nontrivial  way.  The  number 
6 has  two  representations,  2 - 3 = (4  + \/T0  )(4  \/T0  );  yet  exercise  36  shows 

that  2,  3,  4 4-  v/To,  and  4 — x/To  are  all  “prime”  in  this  system. 

Therefore  we  should  prove  rigorously  that  (4.1 0)  is  unique.  There  is 
certainly  only  one  possibility  when  n = 1,  since  the  product  must  be  empty 
in  that  case;  so  let’s  suppose  that  n > 1 and  that  all  smaller  numbers  factor 
uniquely.  Suppose  we  have  two  factorizations 

Tl  = p,  . ..Pm  = qi --.qic,  pi^---^pmand  qi^---^qic, 

where  the  p’s  and  q’s  are  all  prime.  We  will  prove  that  pi  = q 1 . If  not,  we 
can  assume  that  p,  < q„  making  p,  smaller  than  all  the  q’s.  Since  p,  and  q , 
are  prime,  their  gcd  must  be  1;  hence  Euclid’s  self-certifying  algorithm  gives 
us  integers  a and  b such  that  ap,  + bqi  = 1.  Therefore 

aPi  42  ■ • . Pk  + bq^.-.qk  = q2...qk- 

Now  pi  divides  both  terms  on  the  left,  since  qiq2.  . , qK  = n ; hence  p,  divides 
the  right-hand  side,  q2  . . . qi<-  Thus  q2  . . . qk/Pi  is  an  integer,  and  q2  • • ■ qk 
has  a prime  factorization  in  which  p,  appears.  But  q2  . . . qk  < u,  so  it  has  a 
unique  factorization  (by  induction).  This  contradiction  shows  that  p,  must 
be  equal  to  q,  after  all.  Therefore  we  can  divide  both  of  n’s  factorizations  by 
p„  obtaining  p2  . . . pm  = q2  . . . qk  < n.  The  other  factors  must  likewise  be 
equal  (by  induction),  so  our  proof  of  uniqueness  is  complete. 
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It’s  the  factor- 
ization , not  the 
theorem,  that’s 
unique. 


Sometimes  it’s  more  useful  to  state  the  Fundamental  Theorem  in  another 
way:  Every  positive  integer  can  be  written  uniquely  in  the  form 

n = J^[pnp  , where  each  np  ^ 0.  (4.11) 

p 

The  right-hand  side  is  a product  over  infinitely  many  primes;  but  for  any 
particular  n all  but  a few  exponents  are  zero,  so  the  corresponding  factors 
are  1.  Therefore  it’s  really  a finite  product,  just  as  many  “infinite”  sums  are 
really  finite  because  their  terms  are  mostly  zero. 

Formula  (4.11)  represents  n uniquely,  so  we  can  think  of  the  sequence 
(n2>  TI3 , TT5 , . ) as  a number  system  for  positive  integers.  For  example,  the 
prime -exponent  representation  of  12  is  (2, 1,0,0,.  . . ) and  the  prime-exponent 
representation  of  18  is  (1,2, 0,0,  . ).  To  multiply  two  numbers,  we  simply 
add  their  representations.  In  other  words, 


k = m n 4=^ 

kp  = mp  + rip  f 0 r a 1 1 p . 

(4.12) 

This  implies  that 

m\n  <^> 

mp  ^ np  for  all  p, 

(413) 

and  it  follows  immediately  that 

k = gcd(m,n)  <(=)• 

kp  = mm(mp,np)  for  allp; 

(4-14) 

k = 1 c m ( m , n ) <(=£ 

kp  = max(mp,np)  for  all  p. 

(4-15) 

For  example,  since  12  = 22  -31  and  18  — 2’  . 32,  we  can  get  their  gcd  and  1cm 
by  taking  the  min  and  UlclX  of  common  exponents: 


gcd(12, 18)  = 2min(2'lj  -3min(1'2)  = 21  -31  = 6; 
lcm(12, 18)  = 2maxl2'11  ■ 3max(U)  = 22  -32  = 36. 

If  the  prime  p divides  a product  mn  then  it  divides  either  m or  n,  perhaps 
both,  because  of  the  unique  factorization  theorem.  But  composite  numbers 
do  not  have  this  property.  For  example,  the  nonprime  4 divides  60  = 6.10, 
but  it  divides  neither  6 nor  10.  The  reason  is  simple:  In  the  factorization 
60  = 6-10=  (2 -3) (2 -5),  the  two  prime  factors  of  4 = 2-2  have  been  split 
into  two  parts,  hence  4 divides  neither  part.  But  a prime  is  unsplittable,  so 
it  must  divide  one  of  the  original  factors. 


4.3  PRIME  EXAMPLES 

How  many  primes  are  there?  A lot.  In  fact,  infinitely  many.  Euclid 
proved  this  long  ago  in  his  Theorem  9:  20,  as  follows.  Suppose  there  were 
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only  finitely  many  primes,  say  k of  them  — 2,  3,  5,  , Pk.  Then,  said  Euclid, 

we  should  consider  the  number 

M = 2 • 3 ■ 5 • . . . • Pk  + 1 . 

None  of  the  k primes  can  divide  M,  because  each  divides  M — 1 , Thus  there 
must  be  some  other  prime  that  divides  M;  perhaps  M itself  is  prime.  This 
contradicts  our  assumption  that  2,  3,  ...  , Pk  are  the  only  primes,  so  there 
must  indeed  be  infinitely  many. 

Euclid’s  proof  suggests  that  we  define  Euclid  numbers  by  the  recurrence 

en  = e,e2...en_1  + 1,  whenn^l.  (4.16) 

The  sequence  starts  out 

ei  =1+1-2; 
e2  =2  + 1=3; 
e3  = 2-3  + 1 = 7; 
e4  = 2-3-7+ 1 = 43; 

these  are  all  prime.  But  the  next  case,  e5)  is  1 8 07  = 1 3,1  3 9,  It  turns  out  that 
eg  = 3263443  is  prime,  while 

e7  = 547-607-1033-31051; 

e8  = 2 9 8 8 1 -6  7003  -9  1 1 95  2 1 -6  2 1 2 1 5 7 48  1, 

It  is  known  that  e<?, . . . , 7 are  composite,  and  the  remaining  en  are  probably 

composite  as  well.  However,  the  Euclid  numbers  are  all  relatively  prime  to 
each  other;  that  is, 

gcd(em,en)  = 1 , when  m ± n. 

Euclid’s  algorithm  (what  else?)  tells  us  this  in  three  short  steps,  because 
en  mod  em  = 1 when  ri  > nr 

gcd(em,en)=  gcd(1,em)=  gcd(0,1)  = 1 , 

Therefore,  if  we  let  qj  be  the  smallest  factor  of  ej  for  all  j i>  1 , the  primes  qi , 
q2,  q3, ...  are  all  different.  This  is  a sequence  of  infinitely  many  primes. 

Let’s  pause  to  consider  the  Euclid  numbers  from  the  standpoint  of  Chap- 
ter 1.  Can  we  express  en  in  closed  form?  Recurrence  (4.16)  can  be  simplified 
by  removing  the  three  dots:  If  n > 1 we  have 


“Ol  KpUJTOl 
apcdpol  7tA eiovs 
aal  navrot;  tov 
nporedevTOS 
irXrjdovi ; irpujTUJV 
O/pidpuiu.” 

•Euclid  [80] 
[Translation: 

‘There  are  more 
primes  than  in 
any  given  set 
of  primes.  ’’] 


en  — ei  , , , en_2en-i  + 1 - (en_i  - 1 )en -1  + 1 = - en_i  + 1 . 
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Or  probably  more, 
by  the  time  you 
read  this. 


Thus  en  has  about  twice  as  many  decimal  digits  as  en_i  ■ Exercise  37  proves 
that  there’s  a constant  E « 1.2  6 4 such  that 

en  = [E2"  + \\  . (4.17) 

And  exercise  60  provides  a similar  formula  that  gives  nothing  but  primes: 

Pn  = IP3"]  , (4-18) 

for  some  constant  P,  But  equations  like  (4.17)  and  (4.18)  cannot  really  be 
considered  to  be  in  closed  form,  because  the  constants  E and  P are  computed 
from  the  numbers  en  and  pn  in  a sort  of  sneaky  way.  No  independent  re- 
lation is  known  (or  likely)  that  would  connect  them  with  other  constants  of 
mathematical  interest. 

Indeed,  nobody  knows  any  useful  formula  that  gives  arbitrarily  large 
primes  but  only  primes.  Computer  scientists  at  Chevron  Geosciences  did, 
however,  strike  mathematical  oil  in  1984.  Using  a program  developed  by 
David  Slowinski,  they  discovered  the  largest  prime  known  at  that  time, 

2216091  _ 1 

while  testing  a new  Cray  X-MP  supercomputer.  It’s  easy  to  compute  this 
number  in  a few  milliseconds  on  a personal  computer,  because  modern  com- 
puters work  in  binary  notation  and  this  number  is  simply  (11  ...  1)2.  All 

216,091  of  its  bits  are  ' 1 1 . But  it’s  much  harder  to  prove  that  this  number 

is  prime.  In  fact,  just  about  any  computation  with  it  takes  a lot  of  time, 

because  it’s  so  large.  For  example,  even  a sophisticated  algorithm  requires 
several  minutes  just  to  convert  2216091  — 1 to  radix  10  on  a PC.  When  printed 
out,  its  65,050  decimal  digits  require  65  cents  U.S.  postage  to  mail  first  class. 

Incidentally,  2216091  — 1 is  the  number  of  moves  necessary  to  solve  the 
Tower  of  Hanoi  problem  when  there  are  216,091  disks.  Numbers  of  the  form 

2P  — 1 

(where  p is  prime,  as  always  in  this  chapter)  are  called  M ersenne  numbers, 
after  Father  Marin  Mersenne  who  investigated  some  of  their  properties  in  the 
seventeenth  century.  The  Mersenne  primes  known  to  date  occur  for  p = 2,  3, 

5,  7,  1 3,  1 7,  1 9,  3 1,  6 1,  8 9,  1 0 7,  1 2 7,  5 2 1,  6 0 7,  1 2 7 9,  2 2 0 3,  2 2 8 1,  3 2 1 7,  1 2 5 3, 

44  2 3,  9689,  9941 ,11213,1 9937,  21 701 , 23209,  44497, 86243,  1 1 0503, 132049, 
and  216091. 

The  number  2”  1 can’t  possibly  be  prime  if  n is  composite,  because 

2km  1 has2m  1 as  a factor: 

2km  — 1 = ( 2 11  - 1 )(2m|k-' ! + 2m(k~2)  + ...  + 1 ) . 
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But  2P  — 1 isn’t  always  prime  when  p is  prime;  21 1 — 1 = 2047  = 23.89  is  the 
smallest  such  nonprime.  (Mersenne  knew  this.) 

Factoring  and  primality  testing  of  large  numbers  are  hot  topics  nowadays. 
A summary  of  what  was  known  up  to  1981  appears  in  Section  4.5.4  of  [174], 
and  many  new  results  continue  to  be  discovered.  Pages  391-394  of  that  book 
explain  a special  way  to  test  Mersenne  numbers  for  primality. 

For  most  of  the  last  two  hundred  years,  the  largest  known  prime  has 
been  a Mersenne  prime,  although  only  31  Mersenne  primes  are  known.  Many 
people  are  trying  to  find  larger  ones,  but  it’s  getting  tough.  So  those  really 
interested  in  fame  (if  not  fortune)  and  a spot  in  The  Guinness  Book  of  World 
Records  might  instead  try  numbers  of  the  form  2nk  + 1,  for  small  values  of  k 
like  3 or  5.  These  numbers  can  be  tested  for  primality  almost  as  quickly  as 
Mersenne  numbers  can;  exercise  4.5.4-27  of  [174]  gives  the  details. 

We  haven’t  fully  answered  our  original  question  about  how  many  primes 
there  are.  There  are  infinitely  many,  but  some  infinite  sets  are  “denser”  than 
others.  For  instance,  among  the  positive  integers  there  are  infinitely  many 
even  numbers  and  infinitely  many  perfect  squares,  yet  in  several  important 
senses  there  are  more  even  numbers  than  perfect  squares.  One  such  sense 
looks  at  the  size  of  the  nth  value.  The  nth  even  integer  is  2n  and  the  nth 
perfect  square  is  n2;  since  2n  is  much  less  than  ri2  for  large  n,  the  nth  even 
integer  occurs  much  sooner  than  the  nth  perfect  square,  so  we  can  say  there 
are  many  more  even  integers  than  perfect  squares.  A similar  sense  looks  at 
the  number  of  values  not  exceeding  x.  There  are  \x/2\  such  even  integers  and 
perfect  squares;  since  x/2  is  much  larger  than  ^Jx  for  large  x,  again  we 
can  say  there  are  many  more  even  integers. 

What  can  we  say  about  the  primes  in  these  two  senses?  It  turns  out  that 
the  nth  prime,  P,,  is  about  n times  the  natural  log  of  n; 


Weird,  I thought 
there  were  the  same 
number  of  even 
integers  as  per- 
fect squares,  since 
there's  a one-to-one 
correspondence 
between  them. 


Pn  ~ nlnn. 


(The  symbol  l~'  can  be  read  “is  asymptotic  to”;  it  means  that  the  limit  of 
the  ratio  Pn/nlnn  is  1 as  n goes  to  infinity.)  Similarly,  for  the  number  of 
primes  7t(x)  not  exceeding  x we  have  what’s  known  as  the  prime  number 
theorem; 


7l(x)  ~ 


X 

lnx  ' 


Proving  these  two  facts  is  beyond  the  scope  of  this  book,  although  we  can 
show  easily  that  each  of  them  implies  the  other.  In  Chapter  9 we  will  discuss 
the  rates  at  which  functions  approach  infinity,  and  we’ll  see  that  the  func- 
tion nlnn,  our  approximation  to  Pn,  lies  between  2n  and  n2  asymptotically. 
Hence  there  are  fewer  primes  than  even  integers,  but  there  are  more  primes 
than  perfect  squares. 
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These  formulas,  which  hold  only  in  the  limit  as  rt  or  x — > oo,  can  be 
replaced  by  more  exact  estimates.  For  example,  Rosser  and  Schoenfeld  [253] 
have  established  the  handy  bounds 

lnx-§  < < In x - j , forx^67;  (4.19) 

n(lnn  + lnlnn—  |)  < Pn  < n(lnn  + In  Inn  — 1),  for  n 20.  ( 4 . 2 0 ) 


If  we  look  at  a “random”  integer  n,  the  chances  of  its  being  prime  are 
about  one  in  Inn.  For  example,  if  we  look  at  numbers  near  TO16,  we’ll  have  to 
examine  about  16  In  10  Ss  36.8  of  them  before  finding  a prime.  (It  turns  out 
that  there  are  exactly  10  primes  between  1016  — 370  and  1016  — ■ 1.)  Yet  the 
distribution  of  primes  has  many  irregularities.  For  example,  all  the  numbers 
between  P]  P2  Pn  + 2 and  Pi  P2  . . . Pn  + Pn+i  — 1 inclusive  are  composite. 
Many  examples  of  “twin  primes”  p and  p -)-  2 are  known  (5  and  7,  11  and  13, 

1 7 and  1 9,  29  and  31 , ... . 9999999999999641  and  9999999999999643,  ...  ),  yet 
nobody  knows  whether  or  not  there  are  infinitely  many  pairs  of  twin  primes. 
(See  Hardy  and  Wright  [150,  §1.4  and  §2.8].) 

One  simple  way  to  calculate  all  tt(x)  primes  <§  x is  to  form  the  so-called 
sieve  of  Eratosthenes:  First  write  down  all  integers  from  2 through  x.  Next 
circle  2,  marking  it  prime,  and  cross  out  all  other  multiples  of  2.  Then  repeat- 
edly circle  the  smallest  uncircled,  uncrossed  number  and  cross  out  its  other 
multiples.  When  everything  has  been  circled  or  crossed  out,  the  circled  num- 
bers are  the  primes.  For  example  when  x = 10  we  write  down  2 through  10, 
circle  2,  then  cross  out  its  multiples  4,  6,  8,  and  10.  Next  3 is  the  smallest 
uncircled,  uncrossed  number,  so  we  circle  it  and  cross  out  6 and  9.  Now 
5 is  smallest,  so  we  circle  it  and  cross  out  10.  Finally  we  circle  7.  The  circled 
numbers  are  2,  3,  5,  and  7;  so  these  are  the  n(  10)  = 4 primes  not  exceeding  10. 


“Je  me  sers  de  la 
notation  tres  simple 
n!  pour  designer  le 

produit  de  nombres 
ddcroissans  depuis 
n jusgu’a  I’unite, 
savoir  n(n  1) 

(n  - 2).  3.2.1. 
L’empioi  continue/ 
de  l’analyse  combi- 
natoire  que  je  fais 
dans  /a  plupart  de 
mes  d6monstrations, 
a rendu  cette  nota- 
tion indispensa  b It  ” 
— Ch.  Kramp  [186] 


4.4  FACTORIAL  FACTORS 

Now  let’s  take  a look  at  the  factorization  of  some  interesting  highly 
composite  numbers,  the  factorials: 

n 

n!  = l-2-...n=  k , integer  n ^ 0.  (4.21) 

k=i 

According  to  our  convention  for  an  empty  product,  this  defines  O!  to  be  1. 
Thus  n!  = (n  — 1 )!  rt  for  every  positive  integer  n.  This  is  the  number  of 
permutations  of  n distinct  objects.  That  is,  it’s  the  number  of  ways  to  arrange 
rt  things  in  a row:  There  are  p,  choices  for  the  first  thing;  for  each  choice  of 
first  thing,  there  are  n 1 choices  for  the  second;  for  each  of  these  n(n  1) 
choices,  there  are  n 2 for  the  third;  and  so  on,  giving  n(n  1)  (n  2)  ...  (1) 
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arrangements  in  all.  Here  are  the  first  few  values  of  the  factorial  function. 


n 

0 1 

2 3 4 

5 6 7 8 

9 10 

n! 

1 1 

2 6 2 4 

1 20  7 2 0 5 040  403  2 0 

362880  3628800 

It’s  useful  to  know  a few  factorial  facts,  like  the  first  six  or  so  values,  and  the 
fact  that  10!  is  about  3 j million  plus  change;  another  interesting  fact  is  that 
the  number  of  digits  in  n!  exceeds  n when  n ^ 25. 

We  can  prove  that  n!  is  plenty  big  by  using  something  like  Gauss’s  trick 
of  Chapter  1: 


n!2  = ( 1 -2- ... -n)(n- ... -2- 1)  = J^Jk(n  + 1 - k) . 

k=1 


We  have  n k(n  +1  — k)  | (n  + 1 )2,  since  the  quadratic  polynomial 
k(n+l  — k)  = | (n  + 1 )2  — (k-  1 (n  + 1 ))2  has  its  smallest  value  at  k = 1 and 
its  largest  value  at  k = ~ (n  + 1 ) . Therefore 


rin  ^ ni2  ^ n 

k=l  k=l 


(n  + r 


that  is. 


nn/2  < n!  < 


(n+r 


(4.22) 


This  relation  tells  us  that  the  factorial  function  grows  exponentially!! 

To  approximate  n!  more  accurately  for  large  n we  can  use  Stirling’s 
formula,  which  we  will  derive  in  Chapter  9: 


n! 


(4-23) 


And  a still  more  precise  approximation  tells  us  the  asymptotic  relative  error: 
Stirling’s  formula  undershoots  n!  by  a factor  of  about  1 /(  12n).  Even  for  fairly 
small  n this  more  precise  estimate  is  pretty  good.  For  example,  Stirling’s 
approximation  (4.23)  gives  a value  near  3598696  when  n = 10,  and  this  is 
about  0.83%  « 1/120  too  small.  Good  stuff,  asymptotics. 

But  let’s  get  back  to  primes.  We’d  like  to  determine,  for  any  given 
prime  p,  the  largest  power  of  p that  divides  n!;  that  is,  we  want  the  exponent 
of  p in  ri!’S  unique  factorization.  We  denote  this  number  by  ep  (n ! ),  and  we 
start  our  investigations  with  the  small  case  p = 2 and  n = 10.  Since  10!  is  the 
product  of  ten  numbers,  e2(  10!)  can  be  found  by  summing  the  powers-of-2 
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contributions  of  those  ten  numbers;  this  calculation  corresponds  to  summing 
the  columns  of  the  following  array: 


1 2 3 4 5 6 7 8 9 10 

powers  of  2 

divisible  by 

l x x x x x 5 

= L10/2J 

divisible  by  4 

X X 

2 = L10/4J 

divisible  by  8 

X 

l = [10/8J 

powers  of  2 

010201030  1 

8 

(The  column  sums  form  what’s  sometimes  called  the  ruler  function  p(k), 
because  of  their  similarity  to  ‘ the  lengths  of  lines  marking 
fractions  of  an  inch.)  The  sum  of  these  ten  sums  is  8;  hence  28  divides  10! 
but  29  doesn’t. 

There’s  also  another  way:  We  can  sum  the  contributions  of  the  rows. 
The  first  row  marks  the  numbers  that  contribute  a power  of  2 (and  thus  are 
divisible  by  2);  there  are  |_  1 0/2 J = 5 of  them.  The  second  row  marks  those 
that  contribute  an  additional  power  of  2;  there  are  [_10/4J  = 2 of  them.  And 
the  third  row  marks  those  that  contribute  yet  another;  there  are  L 1 0 / 8J  = 1 of 
them.  These  account  for  all  contributions,  so  we  have  £2(1  0!)  = 5 + 2 + 1 = 8, 
For  general  n this  method  gives 

15J *1?]- +[j\- +-  = Ll£] 

k^1 

This  sum  is  actually  finite,  since  the  summand  is  zero  when  2k  > n.  Therefore 
it  has  only  |_lg T\J  nonzero  terms,  and  it’s  computationally  quite  easy.  For 
instance,  when  n = 100  we  have 

£2(100!)  = 50  + 25  + 12  + 6 + 3 + 1 = 97. 

Each  term  is  just  the  floor  of  half  the  previous  term.  This  is  true  for  all  n, 
because  as  a special  case  of  (3.1 1)  we  have  [ri/2k+1J  = [[n/2kJ  /2j . It’s  espe- 
cially easy  to  see  what’s  going  on  here  when  we  write  the  numbers  in  binary: 

100  = (1  100100)  ~ = 100 
[100/2J  = ( 110010) ~ = 50 

[100/4J  = (11001)2  = 2 5 
L100/8J  = (1100)2  = 12 

L100/16J  = (110)2  = 6 

[100/32J  = (11)2  = 3 

[100/64J  = (1)2  = 1 


We  merely  drop  the  least  significant  bit  from  one  term  to  get  the  next. 
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The  binary  representation  also  shows  us  how  to  derive  another  formula. 


e2(n!)  = n — v2(n)  , 


(4-24) 


where  "V2(n)  is  the  number  of  l’s  in  the  binary  representation  of  n.  This 
simplification  works  because  each  1 that  contributes  2”’  to  the  value  of  rt 
contributes  2m~1  + 2m_2+  • . ■ + 2°  =2m  - 1 to  the  value  of  e2(n!). 
Generalizing  our  findings  to  an  arbitrary  prime  p,  we  have 


eP(n.!) 


n 

+ 

n 

T 

+ 

n 

_P_ 

Lp2  j 

.P3. 

(4-25) 


by  the  same  reasoning  as  before. 

About  how  large  is  ep (rt!)?  We  get  an  easy  (but  good)  upper  bound  by 
simply  removing  the  floor  from  the  summand  and  then  summing  an  infinite 
geometric  progression: 


n 


n 


ep(n!)  < — + -t  + 


n 


u / 1 1 

= ~ M H I T + 


n P 
P p - 1 0 


Tl 

P^T' 


For  p = 2 and  rt  = 100  this  inequality  says  that  97  < 100.  Thus  the  up- 
per bound  100  is  not  only  correct,  it’s  also  close  to  the  true  value  97.  In 
fact,  the  true  value  n — "v2(n)  is  ~ p in  general,  because  v2(n)  <C  [lgu]  is 
asymptotically  much  smaller  than  n. 

When  p = 2 and  3 our  formulas  give  £2(11!)  ~ ri  and  £3(11!]  ~ n/2,  so 
it  seems  reasonable  that  every  once  in  awhile  £3  (n! ) should  be  exactly  half 
as  big  as  e2(n!).  For  example,  this  happens  when  rt  = 6 and  n = 7,  because 
6!  = 24  • 32  • 5 = 71/7.  But  nobody  has  yet  proved  that  such  coincidences 
happen  infinitely  often. 

The  bound  on  £p(n!)  in  turn  gives  us  a bound  on  pep(n!>]  which  is  p’s 
contribution  to  n!  : 


p e p ( n! ) < n/(p-1). 

And  we  can  simplify  this  formula  (at  the  risk  of  greatly  loosening  the  upper 
bound)  by  noting  that  p ^ 2P~';  hence  p^/tp-1)  <c  (2p_1  )n/(p-ri  = 2n.  In 
other  words,  the  contribution  that  any  prime  makes  to  n!  is  less  than  2n. 
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We  can  use  this  observation  to  get  another  proof  that  there  are  infinitely 
many  primes.  For  if  there  were  only  the  k primes  2,  3,  , P'K,  then  we’d 

have  n!  < (2n)k  = 2nk  for  all  n > 1,  since  each  prime  can  contribute  at  most 
a factor  of  2”  — 1.  But  we  can  easily  contradict  the  inequality  n!  < 2nk  by 
choosing  n large  enough,  say  n = 22k.  Then 

n!  < 2nk  = 22“k  = nn/2  , 

contradicting  the  inequality  n!  >:  nn/2  that  we  derived  in  (4.22).  There  are 
infinitely  many  primes,  still. 

We  can  even  beef  up  this  argument  to  get  a crude  bound  on  7t(n),  the 
number  of  primes  not  exceeding  n.  Every  such  prime  contributes  a factor  of 
less  than  2n  to  n!;  so,  as  before, 

n ! < 2n7r|n) 

If  we  replace  n!  here  by  Stirling’s  approximation  (4.23),  which  is  a lower 
bound,  and  take  logarithms,  we  get 

n7t(n)  > nlg(n/e)  + \ lg(2Tm)  ; 

hence 

7c(n)  > lg(n/e) . 

This  lower  bound  is  quite  weak,  compared  with  the  actual  value  7i(n)  ~ 
n/lnn,  because  logn  is  much  smaller  than  n/logn  when  n is  large.  But  we 
didn’t  have  to  work  very  hard  to  get  it,  and  a bound  is  a bound. 


4.5  RELATIVE  PRIMALITY 


Like  perpendicular 
lines  don ’t  have 
a common  direc- 
tion, perpendicular 
numbers  don’t  have 
common  factors. 


When  gcd(m,  n)  = 1 , the  integers  m and  n have  no  prime  factors  in 
common  and  we  say  that  they’re  relatively  prime. 

This  concept  is  so  important  in  practice,  we  ought  to  have  a special 
notation  for  it;  but  alas,  number  theorists  haven’t  come  up  with  a very  good 
one  yet.  Therefore  we  cry:  Hear  us,  0 Mathematicians  of  the  World! 
LET  US  notwaitanylonger  ! WE  can  makemanyformulas  clearer 

BY  DEFINING  A NEW  NOTATION  NOW ! LET  US  AGREE  TO  WRITE  "HI  _L  n’ , 

AND  TO  SAY  “m  IS  PRIME  TO  TL,”  IF  m And  n ARE  RELATIVELY  PRIME. 

In  other  words,  let  us  declare  that 

min  m,H  are  integers  and  gcd(m,n)  = 1.  (4-26) 
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A fraction  m/n  is  in  lowest  terms  if  and  only  if  m _|_  n.  Since  we 
reduce  fractions  to  lowest  terms  by  casting  out  the  largest  common  factor  of 
numerator  and  denominator,  we  suspect  that,  in  general, 

m/gcd(m,n)  ± n/gcd(m,  n)  ; (4,27) 

and  indeed  this  is  true.  It  follows  from  a more  general  law,  gcd(km,  knj  = 
kgcd(m,  n),  proved  in  exercise  14. 

The  1 relation  has  a simple  formulation  when  we  work  with  the  prime- 
exponent  representations  of  numbers,  because  of  the  gcd  rule  (4.14): 

min.  min(m.p,np)  = 0 for  allp.  (4-28) 

Furthermore,  since  mp  and  np  are  nonnegative,  we  can  rewrite  this  as 

m -L  n trip  rip  = 0 forallp.  (4.29) 

And  now  we  can  prove  an  important  law  by  which  we  can  split  and  combine 
two  _L  relations  with  the  same  left-hand  side: 

kTmand  kJ_n  <(=>  k 1 m n . (4.30) 

In  view  of  (4.29),  this  law  is  another  way  of  saying  that  kpmp  = 0 and 
kpTip  = 0 if  and  only  if  kp  (trip  + np)  = 0,  when  mp  and  rip  are  nonnegative. 

There’s  a beautiful  way  to  construct  the  set  of  all  nonnegative  fractions 
m/n  with  m _L  n,  called  the  Stem-Brocot  tree  because  it  was  discovered 
independently  by  Moris  Stern  [279],  a German  mathematician,  and  Achille 
Brocot  [35],  a French  clockmaker.  The  idea  is  to  start  with  the  two  fractions 
(y  , and  then  to  repeat  the  following  operation  as  many  times  as  desired: 

m + m’  . m , m' 

Insert  between  two  adiacent  tractions  — and  — . 

n + n'  n n' 

The  new  fraction  (m+m,)/('n.-|-n,)  is  called  the  mediant  of  m/n  and  m ' /n' . 
For  example,  the  first  step  gives  us  one  new  entry  between  y and 

0 1 1 . 

1 > 1 ’ 0 > 

and  the  next  gives  two  more: 

0.  _i  i_  2 i_ 

1 ’ 2 ’ 1 > 1 > 0 ’ 

The  next  gives  four  more. 


The  dot  product  is 
zero,  like  orthogonal 
vectors. 


Interesting  how 
mathematicians 
will  say  “discov- 
ered” when  abso- 
lutely anyone  el se 
would  have  said 
led.” 


2.  L L L i_  3_  2_  3_  ]__  . 

1 ’ 3’  2>  3’  1 ’ 2’  1 ’ 1 ’ 0 ’ 
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(guess  l|0 

infinity,  “ii 
terms.” 


Conserve 


i s 

lowest 


and  then  we’ll  get  8,  16,  and  so  on.  The  entire  array  can  be  regarded  as  an 
infinite  binary  tree  structure  whose  top  levels  look  like  this: 


4 5 5 4 


57877875 


4 5 5 4 3 3 2 1 


Each  fraction  is  m + m.  , where  — is  the  nearest  ancestor  above  and  to  the  left, 
and  is  the  nearest  ancestor  above  and  to  the  right.  (An  “ancestor”  is  a 
fraction  that’s  reachable  by  following  the  branches  upward.)  Many  patterns 
can  be  observed  in  this  tree. 

Why  does  this  construction  work?  Why,  for  example,  does  each  mediant 
fraction  (m+  m')/(n  +n’)  turn  out  to  be  in  lowest  terms  when  it  appears  in 
parody,  this  tree?  (If  m,  m’,  n,  and  n’  were  all  odd,  we’d  get  even/even;  somehow  the 
construction  guarantees  that  fractions  with  odd  numerators  and  denominators 
never  appear  next  to  each  other.)  And  why  do  all  possible  fractions  m/n  occur 
exactly  once?  Why  can’t  a particular  fraction  occur  twice,  or  not  at  all? 

All  of  these  questions  have  amazingly  simple  answers,  based  on  the  fol- 
lowing fundamental  fact:  If  m/n  and  m'/n'  are  consecutive  fractions  at  any 
stage  of  the  construction,  we  have 


m’n-mn’  = 1.  (4.31) 

This  relation  is  true  initially  (1  . 1 — 0-0  = 1);  and  when  we  insert  a new 
mediant  (m  + m’)/(n  + n’),  the  new  cases  that  need  to  be  checked  are 

(m  + m'Jn-m/n  + n')  = 1 ; 
m'(n  + n’)  - (m  + m'jn'  = 1 • 

Both  of  these  equations  are  equivalent  to  the  original  condition  (4.31)  that 
they  replace.  Therefore  (4.31)  is  invariant  at  all  stages  of  the  construction. 

Furthermore,  if  m/n  < m'/n/  and  if  all  values  are  nonnegative,  it’s  easy 
to  verify  that 


m/n  < (m+  m')/(n  + n')  < m'/n'. 
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A mediant  fraction  isn’t  halfway  between  its  progenitors,  but  it  does  lie  some- 
where in  between.  Therefore  the  construction  preserves  order,  and  we  couldn’t 
possibly  get  the  same  fraction  in  two  different  places. 

One  question  still  remains.  Can  any  positive  fraction  a/b  with  a _L  b 
possibly  be  omitted?  The  answer  is  no,  because  we  can  confine  the  construc- 
tion to  the  immediate  neighborhood  of  a/b,  and  in  this  region  the  behavior 
is  easy  to  analyze:  Initially  we  have 


True,  but  if  you  get 
a compound  frac- 
ture you’d  better  go 
see  a doctor. 


m 


n 


Q. 

1 


< (t)  < 


i 

0 


m' 
n'  ’ 


where  we  put  parentheses  around  ^ to  indicate  that  it’s  not  really  present 
yet.  Then  if  at  some  stage  we  have 


m 

n 


< 


(t) 


< 


m? 
n'  ’ 


the  construction  forms  (m  + m')/(n  + n’)  and  there  are  three  cases.  Either 

(m  + m')/(n  + n’)  = a/b  and  we  win;  or  (m  -+■  m/)/(n  + n’)  < a/b  and  we 

can  set  m +—  m -f  m\  n <—  n + n’;  or  (m  + m/)/(n  + n’)  > a/b  and  we 

can  set  m’  <—  m + m’,  n’  <—  Tl  + n’.  This  process  cannot  go  on  indefinitely, 

because  the  conditions 

% — — > o and  2V  — § > 0 
b n n'  o 

imply  that 

an-brn  i>  1 and  brn’  an’  1; 


hence 


(m'  + n')(cm  — bm)  + (m  + n)(bm'  - an')  ^ m'  + n'  + m + n; 

and  this  is  the  same  as  a + b ^ m’  + n’  + m + n.  by  (4.31).  Either  m or  n or 
m’  or  n’  increases  at  each  step,  so  we  must  win  after  at  most  a + b steps. 

The  Farey  series  of  order  N,  denoted  by  3"n,  is  the  set  of  all  reduced 
fractions  between  0 and  1 whose  denominators  are  N or  less,  arranged  in 
increasing  order.  For  example,  if  N = 6 we  have 


<r  - 0111121323451 

•*6  - TTl'l'rTT'S'TrTTT1 


We  can  obtain  3^  in  general  by  starting  with  5)  = y , | and  then  inserting 
mediants  whenever  it’s  possible  to  do  so  without  getting  a denominator  that 
is  too  large.  We  don’t  miss  any  fractions  in  this  way,  because  we  know  that 
the  Stern-Brocot  construction  doesn’t  miss  any,  and  because  a mediant  with 
denominator  <C  N is  never  formed  from  a fraction  whose  denominator  is  > N. 
(In  other  words,  3^  defines  a subtree  of  the  Stern-Brocot  tree,  obtained  by 
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Farey  'nough. 


pruning  off  unwanted  branches.)  It  follows  that  m’n  — mn’  = 1 whenever 
m/n  and  m'/n'  are  consecutive  elements  of  a Farey  series. 

This  method  of  construction  reveals  that  TN  can  be  obtained  in  a simple 
way  from  3rN_1:  We  simply  insert  the  fraction  (m  + m/)/N  between  con- 
secutive fractions  m/n,  m'/n/  of  Tn  - 1 whose  denominators  sum  to  N.  For 
example,  it’s  easy  to  obtain  3/  from  the  elements  of  Jg,  by  inserting  \ , =, 
. . . , j according  to  the  stated  rule: 

3 — £111111231432534561 

’ — 1 1 7 1 6 ’ 5 1 4 1 7 1 3 ’ 5 1 7 1 2 1 7 ’ 5 1 3 1 7 1 4 1 5 1 6 1 7 1 1 ’ 

When  N is  prime,  N — 1 new  fractions  will  appear;  but  otherwise  we’ll  have 
fewer  than  N — 1,  because  this  process  generates  only  numerators  that  are 
relatively  prime  to  N. 

Long  ago  in  (4.5)  we  proved-in  different  words-that  whenever  min 
and  0 < m $ n we  can  find  integers  a and  b such  that 

ma-nb  = 1 . (4-32) 

(Actually  we  said  m’m  + n’n  = gcd(  m,  n),  but  we  can  write  1 for  gcd(  m,  n.) , 
a for  m’,  and  b for  -n\)  The  Farey  series  gives  us  another  proof  of  (4.32), 
because  we  can  let  b/a  be  the  fraction  that  precedes  m/n  in  Tn.  Thus  (4.5) 
is  just  (4.31)  again.  For  example,  one  solution  to  3a  — 7b  = 1 is  a = 5,  b = 2, 
since  | precedes  | in  34.  This  construction  implies  that  we  can  always  find  a 
solution  to  (4.32)  with  0 <i  b < a < n,  if  0 < m <C  n.  Similarly,  if  0 <C  n < m 
and  m 1 n,  we  can  solve  (4.32)  with  0<a^b^mby  letting  a/b  be  the 
fraction  that  follows  n/m  in  3rm. 

Sequences  of  three  consecutive  terms  in  a Farey  series  have  an  amazing 
property  that  is  proved  in  exercise  61.  But  we  had  better  not  discuss  the 
Farey  series  any  further,  because  the  entire  Stern-Brocot  tree  turns  out  to  be 
even  more  interesting. 

We  can,  in  fact,  regard  the  Stern-Brocot  tree  as  a number  system  for 
representing  rational  numbers,  because  each  positive,  reduced  fraction  occurs 
exactly  once.  Let’s  use  the  letters  L and  R to  stand  for  going  down  to  the 
left  or  right  branch  as  we  proceed  from  the  root  of  the  tree  to  a particular 
fraction;  then  a string  of  L’s  and  R’s  uniquely  identifies  a place  in  the  tree. 
For  example,  LRRL  means  that  we  go  left  from  j down  to  | , then  right  to  | , 
then  right  to  then  left  to  | . We  can  consider  LRRL  to  be  a representation 
of  j.  Every  positive  fraction  gets  represented  in  this  way  as  a unique  string 
of  L’s  and  R’s. 

Well,  actually  there’s  a slight  problem:  The  fraction  j corresponds  to 
the  empty  string,  and  we  need  a notation  for  that.  Let’s  agree  to  call  it  I, 
because  that  looks  something  like  1 and  it  stands  for  “identity!’ 
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This  representation  raises  two  natural  questions:  (1)  Given  positive  inte- 
gers m and  n with  m J_  n,  what  is  the  string  of  L’s  and  R’s  that  corresponds 
to  m/n?  (2)  Given  a string  of  L’s  and  R’s,  what  fraction  corresponds  to  it? 
Question  2 seems  easier,  so  let’s  work  on  it  first.  We  define 


f(S)  = fraction  corresponding  to  S 


when  S is  a string  of  L’s  and  R’s.  For  example,  f (LRRL)  = 

According  to  the  construction,  f(S)  = (m  + m')/(n  + n’)  if  m/n  and 
m'/n'  are  the  closest  fractions  preceding  and  following  S in  the  upper  levels 
of  the  tree.  Initially  m/n  = 0/1  and  m/ /n'  = I/O;  then  we  successively 
replace  either  m/n  or  m' /n'  by  the  mediant  (m  + m’)/(n  + n’)  as  we  move 
right  or  left  in  the  tree,  respectively. 

How  can  we  capture  this  behavior  in  mathematical  formulas  that  are 
easy  to  deal  with?  A bit  of  experimentation  suggests  that  the  best  way  is  to 
maintain  a 2 x 2 matrix 


that  holds  the  four  quantities  involved  in  the  ancestral  fractions  m/n  and 
m.'  fn'  enclosing  S.  We  could  put  the  m’s  on  top  and  the  n’s  on  the  bottom, 
fractionwise;  but  this  upside-down  arrangement  works  out  more  nicely  be- 
cause we  have  M(l)  = when  the  process  starts,  and  is  traditionally 
called  the  identity  matrix  I. 

A step  to  the  left  replaces  n’  by  n + n’  and  m’  by  m + m’;  hence 


M<SL>=  (m  m+m')  " ( m m')  (J  !)  = M|S»  (i  !)• 

(This  is  a special  case  of  the  general  rule 

fa  b \ / w x \ _ / aw  + by  ax  + bz\ 
y c d / y y z ) y cw  + dy  cx  + dz  / 

for  multiplying  2x2  matrices.)  Similarly  it  turns  out  that 

M<SR>  = U+m'  m' ) = M<S>  (!  ?)  ■ 

Therefore  if  we  define  L and  R as  2 x 2 matrices. 


If  you’re  clueless 
about  matrices, 
don’t  panic;  this 
book  uses  them 
only  here. 


(4-33) 
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we  get  the  simple  formula  M(S)  = S,  by  induction  on  the  length  of  S.  Isn’t 
that  nice?  (The  letters  L and  R serve  dual  roles,  as  matrices  and  as  letters  in 
the  string  representation.)  For  example. 


M(LRRL) 


^ = a(nMnlO  = (il)(u) 


the  ancestral  fractions  that  enclose  LRRL  = | are  | and  ~.  And  this  con- 
struction gives  us  the  answer  to  Question  2: 


f ( S ) 


n n. 
m m' 


m + m' 
n + n' 


(4.34) 


How  about  Question  1?  That’s  easy,  now  that  we  understand  the  fun- 
damental connection  between  tree  nodes  and  2x2  matrices.  Given  a pair  of 
positive  integers  m and  n,  with  m _L  n,  we  can  find  the  position  of  m/n  in 
the  Stern-Brocot  tree  by  “binary  search’’  as  follows: 


s :=  I; 

while  m/n  y/  f(S)  do 

if  m/n  < f(S)  then  (output(L);  S :=  SL) 
else  (output(R);  S :=  SR) 

This  outputs  the  desired  string  of  L’s  and  R's. 

There’s  also  another  way  to  do  the  same  job,  by  changing  m and  n instead 
of  maintaining  the  state  S.  If  S is  any  2x2  matrix,  we  have 

f(RS ) = f(S)+l 


because  RS  is  like  S but  with  the  top  row  added  to  the  bottom  row.  (Let’s 
look  at  it  in  slow  motion: 


S = 


RS  = 


n 
m + 


m'  + n 


); 


hence  f(S)  = (m-|-m,)/(n-(-n,)  and  f(RS)  = ((m+n)+(m’+n’))/(n+n’).) 
If  we  carry  out  the  binary  search  algorithm  on  a fraction  m/n  with  m > n, 
the  first  output  will  be  R;  hence  the  subsequent  behavior  of  the  algorithm  will 
have  f(S)  exactly  1 greater  than  if  we  had  begun  with  (m  — n)/n  instead  of 
m/n.  A similar  property  holds  for  L,  and  we  have 


m 

= f(RS) 

<b=> 

m ' n = f (S )) 

when  m > n; 

n 

n 

m 

= f (L  S ) 

<==> 

-2-  = f(S  )) 

when  m < n, 
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This  means  that  we  can  transform  the  binary  search  algorithm  to  the  following 
matrix-free  procedure: 

while  m ^ n do 

if  m < n then  (output(L);  n :=  n-m) 

else  (output(R);  m :=  m-n)  . 

For  example,  given  m/n  = 5/7,  we  have  successively 

m = 5 5 3 11 

n.=  7 2 2 2 1 

output  L R R L 

in  the  simplified  algorithm. 

Irrational  numbers  don’t  appear  in  the  Stern-Brocot  tree,  but  all  the 
rational  numbers  that  are  “close”  to  them  do.  For  example,  if  we  try  the 
binary  search  algorithm  with  the  number  e = 2.71828.  . , instead  of  with  a 
fraction  m/n,  we’ll  get  an  infinite  string  of  L’s  and  R1  S that  begins 

RRLRRLRLLLLRLRRRRRRLRLLLLLLLLRLR.... 

We  can  consider  this  infinite  string  to  be  the  representation  of  e in  the  Stern- 
Brocot  number  system,  just  as  we  can  represent  e as  an  infinite  decimal 

2.718281828459. . . or  as  an  infinite  binary  fraction  (10.101 101 1 11 1 10 ...  )2- 

Incidentally,  it  turns  out  that  e’s  representation  has  a regular  pattern  in  the 
Stern-Brocot  system: 

e = RL°RLR2LRL4RLR6LRL8RLR10LRL12RL  . . . ; 

this  is  equivalent  to  a special  case  of  something  that  Euler  [84]  discovered 
when  he  was  24  years  old. 

From  this  representation  we  can  deduce  that  the  fractions 

RRLRRLRLLLL  RLRRRRRR 

1/158VI]930496887  106  193  299  492  605  878  1071  1264 

r r 1 1 2 1 3 1 4 1 7 1 1 r 1 8 1 2 5 1 3 2 1 3 9 1 7 1 ' 1 1 0 1 1 8 1 1 2 5 2 1 3 2 3 1 3 9 4 1 4 6 5 "" 

are  the  simplest  rational  upper  and  lower  approximations  to  e.  For  if  m/n 
does  not  appear  in  this  list,  then  some  fraction  in  this  list  whose  numerator 
is  m and  whose  denominator  is  n lies  between  m/n  and  e.  For  example, 
is  not  as  simple  an  approximation  as  y = 2.714.  . . , which  appears  in 
the  list  and  is  closer  to  e.  We  can  see  this  because  the  Stern-Brocot  tree 
not  only  includes  all  rationals,  it  includes  them  in  order,  and  because  all 
fractions  with  small  numerator  and  denominator  appear  above  all  less  simple 
ones.  Thus,  ^ = RIRLRRLL  is  less  than  y = RRLRRL.  which  is  less  than 
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e = RRLRRLR....  Excellent  approximations  can  be  found  in  this  way.  For 
example,  -^2  K 2.718280  agrees  with  e to  six  decimal  places;  we  obtained  this 
fraction  from  the  first  19  letters  of  e’s  Stern-Brocot  representation,  and  the 
accuracy  is  about  what  we  would  get  with  19  bits  of  e’s  binary  representation. 

We  can  find  the  infinite  representation  of  an  irrational  number  0t  by  a 
simple  modification  of  the  matrix-free  binary  search  procedure: 

if  £X  < 1 then  (output(L);  a :=  a/(  1 — a)) 
else  (output(R);  a :=  a — 1)  . 

(These  steps  are  to  be  repeated  infinitely  many  times,  or  until  we  get  tired.) 
If  a is  rational,  the  infinite  representation  obtained  in  this  way  is  the  same  as 
before  but  with  RL00  appended  at  the  right  of  ot’s  (finite)  representation.  For 
example,  if  (X  = 1 , we  get  RLLL  . . . , corresponding  to  the  infinite  sequence  of 
fractions  ^ p 2^,  j,  |,  which  approach  1 in  the  limit.  This  situation  is 
exactly  analogous  to  ordinary  binary  notation,  if  we  think  of  L as  0 and  R as  1; 
Just  as  every  real  number  x in  [0,  1)  has  an  infinite  binary  representation 
(.bib2b3  . . . )z  not  ending  with  all  I’s,  every  real  number  a in  [0,  oo)  has 
an  infinite  Stern-Brocot  representation  Bi  B2B3  • • • not  ending  with  all  R's. 
Thus  we  have  a one-to-one  order-preserving  correspondence  between  [0,  1) 
and  [0,  co)  if  we  let  0 <->  L and  1 H R, 

There’s  an  intimate  relationship  between  Euclid’s  algorithm  and  the 
Stern-Brocot  representations  of  rationals.  Given  (X  = m/n,  we  get  [_m/nj 
R'S,  then  [n/(m  mod  n)J  L’s,  then  [(m  mod  rt) / (tl  mod  (m  mod  n))J  R's, 
and  so  on.  These  numbers  m mod  n,  n mod  (m  mod  n),  . . . are  just  the  val- 
ues examined  in  Euclid’s  algorithm.  (A  little  fudging  is  needed  at  the  end 
to  make  sure  that  there  aren’t  infinitely  many  R's.)  We  will  explore  this 
relationship  further  in  Chapter  6. 


4.6  MOD’:  THE  CONGRUENCE  RELATION 


“Numerorum 

congruentiam 
hoc  signo,  = , in 
posterum  deno- 
tabimus,  modulum 
ubi  opus  erit  in 
clausulis  a diun- 
gentes,  -16  =9 
(mod.  5) , -7  = 

15  (mod.  11).” 

— C,  F.  Gauss  [115] 


Modular  arithmetic  is  one  of  the  main  tools  provided  by  number 
theory.  We  got  a glimpse  of  it  in  Chapter  3 when  we  used  the  binary  operation 
"mod’,  usually  as  one  operation  amidst  others  in  an  expression.  In  this  chapter 
we  will  use  "mod’  also  with  entire  equations,  for  which  a slightly  different 
notation  is  more  convenient: 

a = b (mod  m)  amodm  = bmodrn.  (4,35) 

For  example,  9 = -16  (mod  5),  because  9 mod  5 = 4 = (-16)  mod  5.  The 
formula  ‘a  = b (mod  m)’  can  be  read  “a  is  congruent  to  b modulo  m."  The 
definition  makes  sense  when  a,  b,  and  m are  arbitrary  real  numbers,  but  we 
almost  always  use  it  with  integers  only. 
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Since  x mod  m differs  from  x by  a multiple  of  m,  we  can  understand 
congruences  in  another  way: 

a = b (mod  m)  <t=^>  a — b is  a multiple  of  m.  (4-36) 

For  if  a mod  m = b mod  m,  then  the  definition  of  ‘mod’  in  (3.21)  tells  us 
that  a — b = a mod  m + km  — (b  mod  m -f  lm)  = (k  t)m  for  some  integers 
k and  l,  Conversely  if  a — b = km,  then  a = b if  m = 0;  otherwise 

a mod  m = a — [a/mjm  = b + km  — [(b  + km)/mjm 

= b—  |b/rn|m  = bmodm. 

The  characterization  of  = in  (4.36)  is  often  easier  to  apply  than  (4.35).  For 
example,  we  have  8 = 23  (mod  5)  because  8 — 23  = -15  is  a multiple  of  5;  we 
don’t  have  to  compute  both  8 mod  5 and  23  mod  5. 

The  congruence  sign  1 = 1 looks  conveniently  like  1 = because  congru- 
ences are  almost  like  equations.  For  example,  congruence  is  an  equivalence 
relation;  that  is,  it  satisfies  the  reflexive  law  ‘a  = a’,  the  symmetric  law 
‘a  = b =4  b = a’,  and  the  transitive  law  ‘a  = b = c =)■  a = c’. 
All  these  properties  are  easy  to  prove,  because  any  relation  '=’  that  satisfies 
‘a  = b <t=4>  f(a)  = f(b)’  for  some  function  f is  an  equivalence  relation.  (In 
our  case,  f(x)  = x mod  m.)  Moreover,  we  can  add  and  subtract  congruent 
elements  without  losing  congruence: 

a e b and  c = d =)>  a + c = b + d (mod  m) ; 

a s b and  c = d ==$>  a - c = b - d (mod  m)  . 

For  if  a — b and  c — d are  both  multiples  of  m,  so  are  (a  + c)  — (b  + d)  = 

(a  — b)  -f  (c  — d)  and  (a  — c)  ~ (b  — d)  = (a  -b)  — (c  — d).  Incidentally,  it 

isn’t  necessary  to  write  ‘(mod  m)’  once  for  every  appearance  of  1 = if  the 
modulus  is  constant,  we  need  to  name  it  only  once  in  order  to  establish  the 
context.  This  is  one  of  the  great  conveniences  of  congruence  notation. 
Multiplication  works  too,  provided  that  we  are  dealing  with  integers: 

a = b and  c = d ac  = bd  (mod  m) , 

integers  b,  c. 

Proof:  ac  — bd  = (a  — b)c  + b(c  — d).  Repeated  application  of  this  multipli- 
cation property  now  allows  us  to  take  powers: 

a = b =)>  a”  = bn  (mod  m) , integers  a,  b; 

integer  n ^ 0. 


“I  fee/  fine  today 
modulo  a slight 
headache.” 

The  Hacker's 
Dictionary  [277] 
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For  example,  since  2 = -1  (mod  3),  we  have  2n  = (—1  )n  (mod  3);  this  means 
that  2”  — 1 is  a multiple  of  3 if  and  only  if  n is  even. 

Thus,  most  of  the  algebraic  operations  that  we  customarily  do  with  equa- 
tions can  also  be  done  with  congruences.  Most,  but  not  all.  The  operation 
of  division  is  conspicuously  absent.  If  ad  = bd  (mod  m),  we  can’t  always 
conclude  that  a = b.  For  example,  3-2  = 5-2  (mod  4),  but  3^5. 

We  can  salvage  the  cancellation  property  for  congruences,  however,  in 
the  common  case  that  d and  m are  relatively  prime: 

ad  = bd  a = b (mod  m) , (4.37) 

integers  a,  b,  d,  m and  d ± m. 

For  example,  it’s  legit  to  conclude  from  15  = 35  (mod  m)  that  3=7  (mod  m), 
unless  the  modulus  m is  a multiple  of  5. 

To  prove  this  property,  we  use  the  extended  gcd  law  (4.5)  again,  finding 
d’  and  m’  such  that  d’d  + m’m  = 1.  Then  if  ad  = bd  we  can  multiply 
both  sides  of  the  congruence  by  d’,  obtaining  ad’d  = bd’d.  Since  d’d  = 1, 
we  have  ad’d  = a and  bd’d  = b;  hence  a = b.  This  proof  shows  that  the 
number  d’  acts  almost  like  1/d  when  congruences  are  considered  (mod  m); 
therefore  we  call  it  the  “inverse  of  d modulo  m!’ 

Another  way  to  apply  division  to  congruences  is  to  divide  the  modulus 
as  well  as  the  other  numbers: 

ad  = bd  (modmd)  4=1-  a = b (modm),  for  d ^ 0.  (4.38) 

This  law  holds  for  all  real  a,  b,  d,  and  m,  because  it  depends  only  on  the 

distributive  law  (a  mod  m)  d = ad  mod  md:  We  have  a mod  m = b mod  m 

4=1-  (a  mod  m)d  = (b  mod  m)d  4=4-  ad  mod  md  = bd  mod  md.  Thus, 

for  example,  from  3 -2  = 5-2  (mod  4)  we  conclude  that  3 = 5 (mod  2). 

We  can  combine  (4.37)  and  (4.38)  to  get  a general  law  that  changes  the 
modulus  as  little  as  possible: 

ad  = bd  (mod  m) 

4=4-  a = b (mod  — j,  integers  a,  b,  d,  m.  (439) 

\ gcd(d,  mf  v ' ’ 

For  we  can  multiply  ad  = bd  by  d’,  where  d'd-f-  m’m  = gcd(  d,  m);  this  gives 
the  congruence  a - gcd(  d,  m)  = b - gcd(  d,  m)  (mod  m),  which  can  be  divided 
by  gcd(d,  m). 

Let’s  look  a bit  further  into  this  idea  of  changing  the  modulus.  If  we 
know  that  a = b (mod  100),  then  we  also  must  have  a = b (mod  10),  or 
modulo  any  divisor  of  100.  It’s  stronger  to  say  that  a — b is  a multiple  of  100 


126  NUMBER  THEORY 


than  to  say  that  it’s  a multiple  of  10.  In  general, 

a = b (mod  md)  =>  a = b (mod  m)  , integer  d,  (4.40) 

because  any  multiple  of  md  is  a multiple  of  m. 

Conversely,  if  we  know  that  a = b with  respect  to  two  small  moduli,  can 
we  conclude  that  a = b with  respect  to  a larger  one?  Yes;  the  rule  is 

a = b (mod  m)  and  a = b (mod  n) 

4=r>  Q = b (mod  lcm(m,  n))  , integers  m,  n > 0.  (4.41) 

For  example,  if  we  know  that  a = b modulo  12  and  18,  we  can  safely  conclude 
that  a = b (mod  36).  The  reason  is  that  if  a b is  a common  multiple  of  m 
and  n,  it  is  a multiple  of  lcm(  m,  n).  This  follows  from  the  principle  of  unique 
factorization. 

The  special  case  m J_  n of  this  law  is  extremely  important,  because 
lcm(m,  n)  = mn  when  m and  n are  relatively  prime.  Therefore  we  will  state 
it  explicitly: 

a = b (mod  mn) 

<=>  a = b (mod  m)  and  a = b (mod  n),  if  ra  -L  n.  (4-42) 

For  example,  a = b (mod  100)  if  and  only  if  a = b (mod  25)  and  a = b 
(mod  4).  Saying  this  another  way,  if  we  know  x mod  25  and  x mod  4,  then 
we  have  enough  facts  to  determine  x mod  100.  This  is  a special  case  of  the 
Chinese  Remainder  Theorem  (see  exercise  30),  so  called  because  it  was 
discovered  by  Sun  Tsii  in  China,  about  a.d.  350. 

The  moduli  m and  n in  (4.42)  can  be  further  decomposed  into  relatively 
prime  factors  until  every  distinct  prime  has  been  isolated.  Therefore 

a = b (mod  m)  a = b(modpmp)  forallp, 

if  the  prime  factorization  (4.11)  of  m is  ]”[  pm<\  Congruences  modulo  powers 
of  primes  are  the  building  blocks  for  all  congruences  modulo  integers. 

4.7  INDEPENDENT  RESIDUES 

One  of  the  important  applications  of  congruences  is  a residue  num- 
ber system,  in  which  an  integer  x is  represented  as  a sequence  of  residues  (or 
remainders)  with  respect  to  moduli  that  are  prime  to  each  other: 

Res(x)  = (x  mod  ml,.  . . ,x  mod  m,)  , if  m.j  _L  for  1 < k ^ T. 


Modulitos ? 


Knowing  x mod  mi,  ....  x mod  TUt  doesn’t  tell  us  everything  about  x.  But 
it  does  allow  us  to  determine  x mod  m,  where  m is  the  product  m;  • • • m,. 
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In  practical  applications  we’ll  often  know  that  x lies  in  a certain  range;  then 
we’ll  know  everything  about  x if  we  know  \ mod  m and  if  m is  large  enough. 

For  example,  let’s  look  at  a small  case  of  a residue  number  system  that 
has  only  two  moduli,  3 and  5: 


x mod  15 

cmod3 

c mod  5 

0 

| 

0 

| 

0 

1 

1 

2 

1 

2 

1 

2 

3 

0 

3 

4 

1 

4 

5 

2 

0 

6 

0 

1 

7 

1 

2 

8 

2 

3 

9 

0 

4 

10 

1 

0 

11 

2 

1 

12 

0 

2 

13 

1 

3 

14 

2 

4 

For  example,  the 
Mersenne  prime 

231  - 1 
works  well. 


Each  ordered  pair  (x  mod  3,  x mod  5)  is  different,  because  x mod  3 = y mod  3 

and  x mod  5 = y mod  5 if  and  only  if  x mod  15  = y mod  15. 

We  can  perform  addition,  subtraction,  and  multiplication  on  the  two 
components  independently,  because  of  the  rules  of  congruences.  For  example, 
if  we  want  to  multiply  7 = (1,2)  by  13  = (1,3)  modulo  15,  we  calculate 
M mod  3 - 1 and  2-3  mod  5 = 1.  The  answer  is  (1 , 1 ) = 1;  hence  713  mod  15 
must  equal  1.  Sure  enough,  it  does. 

This  independence  principle  is  useful  in  computer  applications,  because 
different  components  can  be  worked  on  separately  (for  example,  by  differ- 
ent computers).  If  each  modulus  m.k  is  a distinct  prime  pk,  chosen  to  be 
slightly  less  than  231,  then  a computer  whose  basic  arithmetic  operations 
handle  integers  in  the  range  [— 23,23,)can  easily  compute  sums,  differences, 
and  products  modulo  pk.  A set  of  r such  primes  makes  it  possible  to  add, 
subtract,  and  multiply  “multiple-precision  numbers”  of  up  to  almost  31  r bits, 
and  the  residue  system  makes  it  possible  to  do  this  faster  than  if  such  large 
numbers  were  added,  subtracted,  or  multiplied  in  other  ways. 

We  can  even  do  division,  in  appropriate  circumstances.  For  example, 
suppose  we  want  to  compute  the  exact  value  of  a large  determinant  of  integers. 
The  result  will  be  an  integer  D,  and  bounds  on  |D]  can  be  given  based  on  the 
size  of  its  entries.  But  the  only  fast  ways  known  for  calculating  determinants 
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require  division,  and  this  leads  to  fractions  (and  loss  of  accuracy,  if  we  resort 
to  binary  approximations).  The  remedy  is  to  evaluate  D mod  = D^,  for 
various  large  primes  pk-  We  can  safely  divide  modulo  Pk  unless  the  divisor 
happens  to  be  a multiple  of  pk.  That’s  very  unlikely,  but  if  it  does  happen  we 
can  choose  another  prime.  Finally,  knowing  Dk  for  sufficiently  many  primes, 
we’ll  have  enough  information  to  determine  D. 

But  we  haven’t  explained  how  to  get  from  a given  sequence  of  residues 
(x  mod  mi, . . . ,x  mod  m,)  back  to  x mod  m.  We’ve  shown  that  this  conver- 
sion can  be  done  in  principle,  but  the  calculations  might  be  so  formidable 
that  they  might  rule  out  the  idea  in  practice.  Fortunately,  there  is  a rea- 
sonably simple  way  to  do  the  job,  and  we  can  illustrate  it  in  the  situation 
(x  mod  3,x  mod  5)  shown  in  our  little  table.  The  key  idea  is  to  solve  the 
problem  in  the  two  cases  (1,0)  and  (0,1);  for  if  (1,0)  = a and  (0,1)  — b,  then 
(x,  y)  = (ax  + by)  mod  15,  since  congruences  can  be  multiplied  and  added. 

In  our  case  a = 10  and  b = 6,  by  inspection  of  the  table;  but  how  could 
we  find  a and  b when  the  moduli  are  huge?  In  other  words,  if  m _L  n,  what 
is  a good  way  to  find  numbers  a and  b such  that  the  equations 

amodm  = 1,  amodn  = 0,  bmodm  = 0,  bmodn  = 1 

all  hold?  Once  again,  (4.5)  comes  to  the  rescue:  With  Euclid’s  algorithm,  we 
can  find  m’  and  n/  such  that 

m’m+n’n  = 1. 

Therefore  we  can  take  a = n’n  and  b = m’m,  reducing  them  both  mod  mn 
if  desired. 

Further  tricks  are  needed  in  order  to  minimize  the  calculations  when  the 
moduli  are  large;  the  details  are  beyond  the  scope  of  this  book,  but  they  can 
be  found  in  [174,  page  274).  Conversion  from  residues  to  the  corresponding 
original  numbers  is  feasible,  but  it  is  sufficiently  slow  that  we  save  total  time 
only  if  a sequence  of  operations  can  all  be  done  in  the  residue  number  system 
before  converting  back. 

Let’s  firm  up  these  congruence  ideas  by  trying  to  solve  a little  problem: 
How  many  solutions  are  there  to  the  congruence 

x2  = 1 (mod  m)  , (4.43) 

if  we  consider  two  solutions  x and  x’  to  be  the  same  when  x = x’? 

According  to  the  general  principles  explained  earlier,  we  should  consider 
first  the  case  that  m is  a prime  power,  pk,  where  k > 0.  Then  the  congruence 
x2  = 1 can  be  written 


(x  — 1)(x  + 1)  =0  (modpk) 
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All  primes  are  odd 
except  2,  which  is 
the  oddest  of  all. 


Mathematicians  love 
to  say  that  things 
are  trivial. 


so  p must  divide  either  x — 1 or  x + 1,  or  both.  But  p can’t  divide  both 
x — 1 and  x + 1 unless  p = 2;  we’ll  leave  that  case  for  later.  If  p > 2,  then 
pk\(x  1)(x  + 1)  pk\(x  — 1)  or  pk\(x  + 1);  so  there  are  exactly  two 

solutions,  x = +1  and  x = -1. 

The  case  p = 2 is  a little  different.  If  2k\(x  — 1 )(x  + 1)  then  either  x — 1 
or  x + 1 is  divisible  by  2 but  not  by  4,  so  the  other  one  must  be  divisible 
by  2k~'.  This  means  that  we  have  four  solutions  when  k ^ 3,  namely  x = ±1 
and  x = 2k_1  ± 1.  (For  example,  when  pk  = 8 the  four  solutions  are  x =1,3, 
5,  7 (mod  8);  it’s  often  useful  to  know  that  the  square  of  any  odd  integer  has 
the  form  8tT  + 1.) 

Now  x2  = 1 (mod  m)  if  and  only  if  x2  = 1 (mod  p171!1  ) for  all  primes  p 
with  mp  > 0 in  the  complete  factorization  of  m.  Each  prime  is  independent 
of  the  others,  and  there  are  exactly  two  possibilities  for  x mod  pmp  except 
when  p = 2.  Therefore  if  PT  has  exactly  r different  prime  divisors,  the  total 
number  of  solutions  to  x2  = 1 is  2r,  except  for  a correction  when  m.  is  even. 
The  exact  number  in  general  is 

2T+[8\m]+[4\m]-[2\m]  (4  44) 

For  example,  there  are  four  “square  roots  of  unity  modulo  12,”  namely  1 , 5, 
7,  and  11.  When  m = 15  the  four  are  those  whose  residues  mod  3 and  mod  5 
are  ±1,  namely  (1,  1),  (1,4),  (2,  1),  and  (2,4)  in  the  residue  number  system. 
These  solutions  are  1,  4,  11,  and  14  in  the  ordinary  (decimal)  number  system. 

4.8  ADDITIONAL  APPLICATIONS 

There’s  some  unfinished  business  left  over  from  Chapter  3:  We  wish 
to  prove  that  the  m numbers 

Omodm,  nmodm,  2rt  mod  m,  ....  (m-l)nmodm  (4-45) 

consist  of  precisely  d copies  of  the  m/d  numbers 

0,  d,  2d,  • • • • m-d 

in  some  order,  where  d = gcd(m,  n).  For  example,  when  TIT  = 12  and  n = 8 
we  have  d = 4,  and  the  numbers  are  0,  8,  4,  0,  8,  4,  0,  8,  4,  0,  8,  4. 

The  first  part  of  the  proof-to  show  that  we  get  d copies  of  the  first 
m/d  values-is  now  trivial.  We  have 

jn  = kn  (mod  m)  4=r>  j(n/d)  = k(n/d)  (mod  m/d) 

by  (4.38);  hence  we  get  d copies  of  the  values  that  occur  when  0 ^ k < m/d. 
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Now  we  must  show  that  those  m/d  numbers  are  {0,  d,2d, . . . , m — d} 
in  some  order.  Let’s  write  m = m’d  and  n = n’d.  Then  kn  mod  m — 
d(kn'  mod  m’),  by  the  distributive  law  (3.23);  so  the  values  that  occur  when 
0 ^ k < m’  are  d times  the  numbers 

0 mod  m’,  n’  mod  m’,  2n’  mod  m’,  . . . , (m’  — 1 )n/  mod  m’  . 

But  we  know  that  m’  J_  n’  by  (4.27);  we’ve  divided  out  their  gcd.  Therefore 
we  need  only  consider  the  case  d = 1,  namely  the  case  that  m and  n are 
relatively  prime. 

So  let’s  assume  that  m J_  n.  In  this  case  it’s  easy  to  see  that  the  numbers 
(4.45)  are  just  {0,  1,  ...  , m — 1 } in  some  order,  by  using  the  “pigeonhole 
principle!’  This  principle  states  that  if  m pigeons  are  put  into  m pigeonholes, 
there  is  an  empty  hole  if  and  only  if  there’s  a hole  with  more  than  one  pigeon. 
(Dirichlet’s  box  principle,  proved  in  exercise  3.8,  is  similar.)  We  know  that 
the  numbers  (4.45)  are  distinct,  because 

jn  : kn  (mod  m)  j = k (mod  m) 

when  min;  this  is  (4.37).  Therefore  the  m different  numbers  must  fill  all  the 
pigeonholes  0,  1 , . . . , m — 1 . Therefore  the  unfinished  business  of  Chapter  3 
is  finished. 

The  proof  is  complete,  but  we  can  prove  even  more  if  we  use  a direct 
method  instead  of  relying  on  the  indirect  pigeonhole  argument.  Ifm  In  and 
if  a value  j £ [0,  m)  is  given,  we  can  explicitly  compute  k £ [0,  m)  such  that 
kn  mod  m = j by  solving  the  congruence 

kn  = j (mod  m) 

for  k.  We  simply  multiply  both  sides  by  n’,  where  m’m  + n’n  = 1,  to  get 
k = jn’  [mod  m)  ; 
hence  k = jn’  mod  m. 

We  can  use  the  facts  just  proved  to  establish  an  important  result  discov- 
ered by  Pierre  de  Fermat  in  1640.  Fermat  was  a great  mathematician  who 
contributed  to  the  discovery  of  calculus  and  many  other  parts  of  mathematics. 
He  left  notebooks  containing  dozens  of  theorems  stated  without  proof,  and 
each  of  those  theorems  has  subsequently  been  verified-except  one.  The  one 
that  remains,  now  called  “Fermat’s  Last  Theorem,”  states  that 


a11  + bn  ^ Cn 


(4.46) 
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Euler  [93j  con- 
jectured that 

ct4  + b4  + cVd\ 
but  Noam  Elkies 

found  infinitely 
many  solutions  in 
August,  1987. 

Now  Roger  Frye  has 
done  an"  exhaustive 
computer  search, 

proving  (after  about 

110  hours  on  a Con- 
nection Machine) 
that  the  smallest 
solution  is: 

958004  + 21 751 94 
+ 4145604 
= 422481 4 . 


lag uelle  propo- 
sition, si  elle  est 
vraie,  est  de  tr& 
grand  usage." 

-P.  de  Fermat  [97] 


for  all  positive  integers  a,  b,  c,  and  n,  when  n > 2.  (Of  course  there  are  lots 
of  solutions  to  the  equations  a + b = c and  a2  + b2  = c2.)  This  conjecture 
has  been  verified  for  all  n ^ 150000  by  Tanner  and  Wagstaff  [285]. 

Fermat’s  theorem  of  1640  is  one  of  the  many  that  turned  out  to  be  prov- 
able. It’s  now  called  Fermat’s  Little  Theorem  (or  just  Fermat’s  theorem,  for 
short),  and  it  states  that 

np~1  = 1 (modp),  ifn-Lp.  (4.47) 

Proof:  As  usual,  we  assume  that  p denotes  a prime.  We  know  that  the 
p-l  numbers  n mod  p,  2n  mod  p,  . . . . (p  — 1 )n  mod  p are  the  numbers  1 , 2, 

. p — l in  some  order.  Therefore  if  we  multiply  them  together  we  get 

n ■ (2n)  • . . . . ((p~l)n) 

= (n  mod  p) . (2n  mod  p) ((p  — 1 )n  mod  p) 

= (P-l)!, 

where  the  congruence  is  modulo  p.  This  means  that 

(p  - 1)!np~1  = (p-l)!  (modp), 

and  we  can  cancel  the  (p  1)!  since  it’s  not  divisible  by  p.  QED. 

An  alternative  form  of  Fermat’s  theorem  is  sometimes  more  convenient: 

np  s n (mod  p)  , integer  n.  (448) 

This  congruence  holds  for  all  integers  n.  The  proof  is  easy:  If  n 1 p we 
simply  multiply  (4.47)  by  n.  If  not,  p\n,  so  np  = 0 = n. 

In  the  same  year  that  he  discovered  (4.47),  Fermat  wrote  a letter  to 
Mersenne,  saying  he  suspected  that  the  number 

fn-  22“+1 

would  turn  out  to  be  prime  for  all  n ^ 0.  He  knew  that  the  first  five  cases 
gave  primes: 

2’+1  = 3;  22+l  = 5;  24+l  = 17;  2®+1  = 2 5 7 ; 2,6+1  = 65  5 3 7; 

but  he  couldn’t  see  how  to  prove  that  the  next  case,  232  + 1 — 4294967297, 
would  be  prime. 

It’s  interesting  to  note  that  Fermat  could  have  proved  that  232  + 1 is  not 
prime,  using  his  own  recently  discovered  theorem,  if  he  had  taken  time  to 
perform  a few  dozen  multiplications:  We  can  set  n = 3 in  (4.47),  deducing 
that 


3232  = I (mod  232  + l),  if  232  + 1 is  prime. 
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And  it’s  possible  to  test  this  relation  by  hand,  beginning  with  3 and  squaring 
32  times,  keeping  only  the  remainders  mod  232  + 1.  First  we  have  32  = 9, 

then  32'  = 81,  then  323  = 65  61,  and  so  on  until  we  reach 

323'  = 3 0 2 90  2 6 16  0 (mod  232  + 1)  . 

The  result  isn’t  1,  so  232  + 1 isn’t  prime.  This  method  of  disproof  gives  us 
no  clue  about  what  the  factors  might  be,  but  it  does  prove  that  factors  exist. 
(They  are  641  and  6700417.) 

If  32  had  turned  out  to  be  1 , modulo  232  + 1,  the  calculation  wouldn’t 
have  proved  that  232  + 1 is  prime;  it  just  wouldn’t  have  disproved  it.  But 
exercise  47  discusses  a converse  to  Fermat’s  theorem  by  which  we  can  prove 
that  large  prime  numbers  are  prime,  without  doing  an  enormous  amount  of 
laborious  arithmetic. 

We  proved  Fermat’s  theorem  by  cancelling  (p  — 1 )!  from  both  sides  of  a 
congruence.  It  turns  out  that  (p  — I)!  is  always  congruent  to  -1,  modulo  p; 
this  is  part  of  a classical  result  known  as  Wilson’s  theorem: 

( n - - 1 ) ! = - 1 (mod  n)  <==4>  n is  prime,  if  n > 1 . (4.49) 

One  half  of  this  theorem  is  trivial:  If  n > 1 is  not  prime,  it  has  a prime 
divisor  p that  appears  as  a factor  of  (n  1 ) ! , so  (n  — 1 ) ! cannot  be  congruent 
to  ■ l . (If  (n-  1 ) ! were  congruent  to  ■ l modulo  n,  it  would  also  be  congruent 
to  ■ l modulo  p,  but  it  isn’t.) 

The  other  half  of  Wilson’s  theorem  states  that  (p  — 1 )!  = ■ l (mod  p). 
We  can  prove  this  half  by  pairing  up  numbers  with  their  inverses  mod  p.  If 
n 1 p,  we  know  that  there  exists  n’  such  that 

n’n  = 1 (mod  p) ; 

here  n’  is  the  inverse  of  n,  and  n is  also  the  inverse  of  n’.  Any  two  inverses 
of  n must  be  congruent  to  each  other,  since  nn’  = nn”  implies  n’  = n”. 

Now  suppose  we  pair  up  each  number  between  1 and  p-1  with  its  invase. 
Since  the  product  of  a number  and  its  inverse  is  congruent  to  1 , the  product 
of  all  the  numbers  in  all  pairs  of  inverses  is  also  congruent  to  l ; so  it  seems 
that  (p  — I ) ! is  congruent  to  l.  Let’s  check,  say  for  p = 5.  We  get  4!  = 24; 
but  this  is  congruent  to  4,  not  1,  modulo  5.  Oops-  what  went  wrong?  Let’s 
take  a closer  look  at  the  inverses: 

1’  1 , 2'  = 3,  3'  =2,  4'  = 4. 

Ah  so;  2 and  3 pair  up  but  1 and  4 don’t-they’re  their  own  inverses. 

To  resurrect  our  analysis  we  must  determine  which  numbers  are  their 
own  inverses.  If  x is  its  own  inverse,  then  x2  = l (mod  p);  and  we  have 


If  this  is  Fermat’s 
Little  Theorem, 
the  other  one  ns 
last  but  not  least. 


If  p is  prime,  is  p 
prime  prime ? 


“Si  fuerit  N ad  x 
numerus  primus 
et  n numerus 
partium  ad  N 
primarum,  turn 
potEstas  xn  unitate 
minuta  semper  per 
numerum  N erit 
divisibilis." 

—L.  Euler  [89] 
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already  proved  that  this  congruence  has  exactly  two  roots  when  p > 2 . (If 
p =2  it’s  obvious  that  ( p 1 )!  = -1,  so  we  needn’t  worry  about  that  case.) 

The  roots  are  1 and  p 1 , and  the  other  numbers  (between  1 and  p — 1 ) pair 
up;  hence 

( p- 1 ) ! = 1 - (p  — 1)  = -1, 

as  desired. 

Unfortunately,  we  can’t  compute  factorials  efficiently,  so  Wilson’s  theo- 
rem is  of  no  use  as  a practical  test  for  primality.  It’s  just  a theorem. 

4.9  PHI  AND  MU 

How  many  of  the  integers  {0,  1,  ...  , m-1}  are  relatively  prime  to  m? 

This  is  an  important  quantity  called  (p(m),  the  “totient”  of  m (so  named  by 
J.  J.  Sylvester  [284],  a British  mathematician  who  liked  to  invent  new  words). 

We  have  tp(l)  = 1,  cp(p)  = p — 1,  and  cp(m)  < m-  1 for  all  composite 
numbers  m. 

The  (p  function  is  called  Euler’s  totient  function,  because  Euler  was  the 
first  person  to  study  it.  Euler  discovered,  for  example,  that  Fermat’s  theorem 
(4.47)  can  be  generalized  to  nonprime  moduli  in  the  following  way: 

n<p(m)  = j (mod  ; if  n 1 m.  (4.50) 

(Exercise  32  asks  for  a proof  of  Euler’s  theorem.) 

If  m is  a prime  power  pk,  it’s  easy  to  compute  (p[m),  because  n 1 pk  <( — )> 
p\n.  The  multiples  of  p in  {0, 1 , . . . ,pk  —1}  are  {0,p,2p, . . . ,pk  — p};  hence 
there  are  pk_1  of  them,  and  (p(pk)  counts  what  is  left: 

<p(pk)  = pk  - p^1 

Notice  that  this  formula  properly  gives  <p(p)  = p — 1 whmk=  1. 

If  m > 1 is  not  a prime  power,  we  can  write  m = mi  m.2  where  rrti  -L  m^. 
Then  the  numbers  0 <[  n < m can  be  represented  in  a residue  number  system 
as  (n  mod  mi , n mod  m2).  We  have 

n J.  m n mod  mi  1 mi  and  nmod  m2  1 m2 

by  (4.30)  and  (4.4).  Hence,  n mod  m is  “good”  if  and  only  if  n mod  mi 
and  n mod  m2  are  both  “good,”  if  we  consider  relative  primality  to  be  a 
virtue.  The  total  number  of  good  values  modulo  m can  now  be  computed, 
recursively:  It  is  cp ( mi  )cp( m2),  because  there  are  cp(mi ) good  ways  to  choose 
the  first  component  n mod  mi  and  (p(m.2)  good  ways  to  choose  the  second 
component  n mod  m2  in  the  residue  representation. 
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For  example,  <p(12)  = cp(4)<p(3]  = 2'2  = 4,  because  n is  prime  to  12  if 
and  only  if  n mod  4 = (1  or  3]  and  n mod  3 = (1  or  2).  The  four  values  prime 
to  12  are  (1,1),  (1,2),  (3,1),  (3,2)  in  the  residue  number  system;  they  are 
1,  5,  7,  11  in  ordinary  decimal  notation.  Euler’s  theorem  states  that  rt4  = 1 
(mod  12)  whenever  n _L  12. 

A function  f(m)  of  positive  integers  is  called  multiplicative  if  f (1)  = 1 

and 


f(mim2)  = f(mt)f(m.2)  whenever  mi  1 m2. 


(4.5l) 


“Si  sint  A et  B nu- 
meri  inter  se  primi 
et  numerus  partiura 
ad  A primarum 
sit  - a,  numerus 
vero  partium  ad  B 
primarum  sit  = b , 
turn  numerus  par- 
tium ad  productum 
AB  primarum  erit 
_ ab.” 

- —L.  Euler  [89] 


We  have  just  proved  that  (p(m)  is  multiplicative.  We’ve  also  seen  another 
instance  of  a multiplicative  function  earlier  in  this  chapter:  The  number  of 
incongruent  solutions  to  x2  = 1 (mod  m)  is  multiplicative.  Still  another 
example  is  f(m)  = m“  for  any  power  OC, 

A multiplicative  function  is  defined  completely  by  its  values  at  prime 
powers,  because  we  can  decompose  any  positive  integer  m into  its  prime- 
power  factors,  which  are  relatively  prime  to  each  other.  The  general  formula 


f(rn)  = nf(Pmp)-  ifm=npmp 

p p 


(4-52) 


holds  if  and  only  if  f is  multiplicative. 

In  particular,  this  formula  gives  us  the  value  of  Euler’s  totient  function 
for  general  m: 

<p(m)  = fj(pmp  -p"1^1)  = m[](l  (4-53) 

p\m  p\m  ^ 

For  example,  <p(  12)  = (4  — 2) (3  — 1)  = 12(1  — j )(1  ~ 5). 

Now  let’s  look  at  an  application  of  the  <p  function  to  the  study  of  rational 
numbers  mod  1.  We  say  that  the  fraction  m/n  is  basic  if  0 ^m  < n.  There- 
fore (p(n)  is  the  number  of  reduced  basic  fractions  with  denominator  n;  and 
the  Farey  series  Tn  contains  all  the  reduced  basic  fractions  with  denominator 
n or  less,  as  well  as  the  non-basic  fraction  j. 

The  set  of  all  basic  fractions  with  denominator  12,  before  reduction  to 
lowest  terms,  is 


_0_  j_  _2_  _3_  _4_  J_  _6  _7  JL  JL  1°  H 

12’  12’  12’  12’  12’  12’  12’  12’  12’  12’  12’  12' 


Reduction  yields 


0_L111JL12-225H 
1 ’ 12’  6 ’ 4’  3’  12’  2’  12’  3’  4’  6’  12  ’ 
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and  we  can  group  these  fractions  by  their  denominators: 

0.  1.  12.  13.  15.  J_  _5_  ]_  H 

1 ' 2>  3 * 3 > 4’  4’  6 > 6 > 12>  12'  12>  12 

What  can  we  make  of  this?  Well,  every  divisor  d of  12  occurs  as  a denomi- 
nator, together  with  all  cp(d)  of  its  numerators.  The  only  denominators  that 
occur  are  divisors  of  12.  Thus 

<p(1)  + (p(2)  + cp(3)  + cp(4)  + cp(6)  + t(12)  = 12. 

A similar  thing  will  obviously  happen  if  we  begin  with  the  unreduced  fractions 
£•  ■ ■ nr  for  any  m,  hence 

]T  <P(d)  = m,  (4-54) 

d\m 

We  said  near  the  beginning  of  this  chapter  that  problems  in  number 
theory  often  require  sums  over  the  divisors  of  a number.  Well,  (4.54)  is  one 
such  sum,  so  our  claim  is  vindicated.  (We  will  see  other  examples.) 

Now  here’s  a curious  fact:  If  f is  any  function  such  that  the  sum 

g(m)  = Y_  f (d) 

d\m 


is  multiplicative,  then  f itself  is  multiplicative.  (This  result,  together  with 
(4-54)  and  the  fact  that  g(m)  - m is  obviously  multiplicative,  gives  another 
reason  why  <p{m)  is  multiplicative.)  We  can  prove  this  curious  fact  by  in- 
duction on  m:  The  basis  is  easy  because  f (1 ) = g (1 ) = 1 . Let  m > 1 , and 
assume  that  f (mi  m2)  = f (mi ) f (m2)  whenever  mi  1 m2  and  mi  m2  < m.  If 
m = mi  m2  and  mi  1 m2,  we  have 

g(mim2)  = f(d)  - I I f(d,d2), 
d\mi  m2  d|\mi  d2\m2 

and  di  1 d2  since  all  divisors  of  mi  are  relatively  prime  to  all  divisors  of 
m2.  By  the  induction  hypothesis,  f (di  d2)  = f (di ) f (d2 ) except  possibly  when 
di  = mi  and  d2  = m2;  hence  we  obtain 

(Iwl  f(d2)'j  - f(mi)f(m2)  + f(mim2) 

'di\mi  d2\mj  ' 

= g(mi)g(m2) -f(mi)f(m2) +f(mim2) . 

But  this  equals  g(m1m2)  = g(mi)g(m2),  so  f(mim2)  = f(m,  )f(m2). 
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Conversely,  if  f(m)  is  multiplicative,  the  corresponding  sum-over-divisors 
function  g(m)  = ^d\mf(d)  is  always  multiplicative.  In  fact,  exercise  33  shows 
that  even  more  is  true.  Hence  the  curious  fact  is  a fact. 

The  Mobius  function  p.(m),  named  after  the  nineteenth-century  mathe- 
matician August  Mobius  who  also  had  a famous  band,  is  defined  for  all  m ^ 1 
by  the  equation 

Y_  M-(d)  = [m  — 1] . (4.55) 

d\m 

This  equation  is  actually  a recurrence,  since  the  left-hand  side  is  a sum  con- 
sisting of  p.(m)  and  certain  values  of  q.(d)  with  d < m.  For  example,  if  we 
plug  in  m=  1,2,  . . . , 12  successively  we  can  compute  the  first  twelve  values: 


n 

1 

2 

3 4 5 6 

7 8910  11  12 

m 

1 

- 1 

■ 1 0-11 

■ 1 OOl  -1  0 

Mobius  came  up  with  the  recurrence  formula  (4.55)  because  he  noticed 
that  it  corresponds  to  the  following  important  “inversion  principle”: 

g(m)  = ^_f(d)  f(m)  = ^p.(d)g(j)  , (4.56) 

d\m  d\m 

According  to  this  principle,  the  p.  function  gives  us  a new  way  to  understand 
any  function  f(m)  for  which  we  know  f ( d ) . 

The  proof  of  (4.56)  uses  two  tricks  (4.7)  and  (4-9)  that  we  described  near 
the  beginning  of  this  chapter:  If  g(m)  = f(d)  then 


^li(d)g(^)  = ^ H(^)g(d) 

d\m  d\m 

= KlI'W 

d\m  k\d 

= L L 

k\m  d\(m/k) 

= Y.  Z 

k\m  d\(m/k) 

- [m/k=  l]f(k)  = f(m). 

k\m 

The  other  half  of  (4.56)  is  proved  similarly  (see  exercise  12). 

Relation  (4.56)  gives  us  a useful  property  of  the  Mobius  function,  and  we 
have  tabulated  the  first  twelve  values;  but  what  is  the  value  of  |i(m)  when 


Now  i s a good  time 

to  try  warmup 

exercise  11. 


4.9  PHI  AND  MU  137 


Depending  on 
fast  you  read. 


m is  large?  How  can  we  solve  the  recurrence  (4.55)?  Well,  the  function 
g(m)  = [m  = 1]  is  obviously  multiplicative-after  all,  it’s  zero  except  when 
m = 1 . So  the  Mobius  function  defined  by  (4.55)  must  be  multiplicative,  by 
bow  what  we  proved  a minute  or  two  ago.  Therefore  we  can  figure  out  what  p(m) 
is  if  we  compute  p(pk), 

When  m = pk,  (4.55)  says  that 

p(1)  + p(p)  + p(p2)+---  + p(pk)  = 0 

for  all  k 3>  1,  since  the  divisors  of  pk  are  1,  . . . , pk.  It  follows  that 

(Up)  = ■ 1;  p(pk)  = 0 fork  > 1. 


Therefore  by  (4.52),  we  have  the  general  formula 


h-m  = 

p\m 


r (— 1 )r , if  m =pip2...pr; 

[ 0,  if  hi  is  divisible  by  some  p2. 


(4-57) 


That’s  p. 

If  we  regard  (4.54)  as  a recurrence  for  the  function  (p(m),  we  can  solve 
that  recurrence  by  applying  Mobius’ s rule  (4.56).  The  resulting  solution  is 


<p(m)  = Y p(d) 
d\m 


m 

“d  ' 


(4-58) 


For  example, 

cp(12)  = p(1 ) ■ 12  + p(2) -6  + p(3) -4  + p(4) -3  + p(6)  - 2 + p(12)  • 1 
= 12-6-4  + 0 + 2 + 0 = 4. 


If  in  is  divisible  by  r different  primes,  say  {p^  , , , , , p,  },  the  sum  (4.58)  has  only 
2’  nonzero  terms,  because  the  p function  is  often  zero.  Thus  we  can  see  that 
(4.58)  checks  with  formula  (4.53),  which  reads 


cp(m) 


(1-1) 

f1_±\ 

\ Pi ) ■ ■ 

\ pr ) 

if  we  multiply  out  the  r factors  (1  — 1 /Pj),  we  get  precisely  the  2’  nonzero 
terms  of  (4.58).  The  advantage  of  the  Mobius  function  is  that  it  applies  in 
many  situations  besides  this  one. 

For  example,  let’s  try  to  figure  out  how  many  fractions  are  in  the  Farey 
series  £Fn.  This  is  the  number  of  reduced  fractions  in  [0, 1]  whose  denominators 
do  not  exceed  n,  so  it  is  1 greater  than  (D(n)  where  we  define 


®(x)  = 21  <P(k)- 

l^k^x 


(4.59) 
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(We  must  add  1 to  0(n)  because  of  the  final  fraction  j.)  The  sum  in  (4.59) 
looks  difficult,  but  we  can  determine  O(x)  indirectly  by  observing  that 


2>G)  = jwn+u 

d^l 


(4.60) 


for  all  real  x 0.  Why  does  this  identity  hold?  Well,  it’s  a bit  awesome  yet 
not  really  beyond  our  ken.  There  are  j [xj  [1  + XJ  basic  fractions  m/n  with 
0 <C  m < n <C  x,  counting  both  reduced  and  unreduced  fractions;  that  gives 
us  the  right-hand  side.  The  number  of  such  fractions  with  gcd(m,n)  = d 
is  @(x/d),  because  such  fractions  are  m,/n-,  with  0 <t  m’  < n’  ^ x/d  after 
replacing  m by  m’d  and  n by  n’d.  So  the  left-hand  side  counts  the  same 
fractions  in  a different  way,  and  the  identity  must  be  true. 

Let’s  look  more  closely  at  the  situation,  so  that  equations  (4.59)  and 
(4.60)  become  clearer.  The  definition  of  ®(x)  implies  that  ®(x)  = 0(|jxJ); 
but  it  turns  out  to  be  convenient  to  define  O(x)  for  arbitrary  real  values,  not 
just  for  integers.  At  integer  values  we  have  the  table 


n 

0 12  3 4 

5 6 7 8 9 

10 

11 

12 

cp(n) 

-112  2 4 

2 6 4 6 

4 

10 

4 

0(n) 

0 12  4 6 

1 0 1 2 1 8 2 2 2 8 

32 

42 

46 

(This  extension  to 
real  values  is  a use- 
ful trick  for  many 
recurrences  that 
arise  in  the  analysis 
of  algorithms.) 


and  we  can  check  (4.60)  when  x =12: 

0(12)+  0(6)  +0(4)  + 0(3)  + 0(2)  + 0(2)  +6-0(1) 

= 46  + 12  + 6 + 4 + 2 + 2 + 6=  78  = 1-12  13. 

Amazing. 

Identity  (4.60)  can  be  regarded  as  an  implicit  recurrence  for  0(x);  for 
example,  we’ve  just  seen  that  we  could  have  used  it  to  calculate  0(12)  from 
certain  values  of  ®(m.)withm  < 12.  And  we  can  solve  such  recurrences  by 
using  another  beautiful  property  of  the  Mobius  function: 

g(x)  = f(x/ d)  <=+  f(x)  = Y M-(d) g(^) . (4.61) 

d£l  dSsl  d 

This  inversion  law  holds  for  all  functions  f such  that  d>1  |f(x/kd)j  < 00; 
we  can  prove  it  as  follows.  Suppose  g(x)  = )Td>i  f(x/d).  Then 

M-(d)g(x/d)  = y_  n(d)  ^ f(x/kd) 
d=!l  d^l  k^l 

= Y_  f(x/m)  y_  +(d)fm  = kd] 

m^l  d.k^l 
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- X Y M-(d)  = Y f(x/m)[m  = l]  = f(x). 

d\m 

The  proof  in  the  other  direction  is  essentially  the  same. 

So  now  we  can  solve  the  recurrence  (4.60)  for  ®(x): 

=\Y_  Lx/dJL1  + x/dJ  (4-62) 

djrl 

This  is  always  a finite  sum.  For  example, 

0(12)  = 1(12- 13  — 6-7 -4-5  + 0 — 2-3  + 2-3 
— 1-2  + 0 + 0 + 1-2  — 1-2  + 0) 

= 78-21  -10-3  + 3-1  + 1-1  = 46. 

In  Chapter  9 we’ll  see  how  to  use  (4.62)  to  get  a good  approximation  to  <D(x); 
in  fact,  we’ll  prove  that 

O(x)  = -Kx2  + O(xlogx) . 

7lz 

Therefore  the  function  ®(x)  grows  “smoothly”;  it  averages  out  the  erratic 
behavior  of  <p(k). 

In  keeping  with  the  tradition  established  last  chapter,  let’s  conclude  this 
chapter  with  a problem  that  illustrates  much  of  what  we’ve  just  seen  and  that 
also  points  ahead  to  the  next  chapter.  Suppose  we  have  beads  of  n different 
colors;  our  goal  is  to  count  how  many  different  ways  there  are  to  string  them 
into  circular  necklaces  of  length  m.  We  can  try  to  “name  and  conquer”  this 
problem  by  calling  the  number  of  possible  necklaces  N (m,  n). 

For  example,  with  two  colors  of  beads  R and  B,  we  can  make  necklaces 
of  length  4 in  N (4,2)  = 6 different  ways: 

rRA  rRA  rRA  rRA  rRA 

RR  RR  RB  BB  BB  BB 


All  other  ways  are  equivalent  to  one  of  these,  because  rotations  of  a necklace 
do  not  change  it.  However,  reflections  are  considered  to  be  different;  in  the 
case  m = 6,  for  example. 


rBA 

R R 

I I 


rBA 

R R 

I I 
B R 


R B 

^-BJ 


is  different  from 
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The  problem  of  counting  these  configurations  was  first  solved  by  P.  A.  Mac- 
Mahon  in  1892  [212], 

There’s  no  obvious  recurrence  for  N (m,  n),  but  we  can  count  the  neck- 
laces by  breaking  them  each  into  linear  strings  in  m ways  and  considering  the 
resulting  fragments.  For  example,  when  m = 4 and  n = 2 we  get 


RRRR 

RRRR 

RRRR 

RRRR 

RRBR 

RRRB 

BRRR 

RBRR 

RBBR 

RRBB 

BRRB 

BBRR 

RBRB 

BRBR 

RBRB 

BRBR 

RBBB 

BRBB 

BBRB 

BBBR 

BBBB 

BBBB 

BBBB 

BBBB 

Each  of  the  nm  possible  patterns  appears  at  least  once  in  this  array  of 
mN(m,n)  strings,  and  some  patterns  appear  more  than  once.  How  many 
times  does  a pattern  do . . . am_i  appear?  That’s  easy:  It’s  the  number  of 
cyclic  shifts  . . . am_i  ao  • • • i that  produce  the  same  pattern  as  the  orig- 
inal do...  Qm_i . For  example,  BRBR  occurs  twice,  because  the  four  ways  to 
cut  the  necklace  formed  from  BRBR  produce  four  cyclic  shifts  (BRBR,  RBRB, 
BRBR,  RBRB);  two  of  these  coincide  with  BRBR  itself.  This  argument  shows 
that 


mN(m,n)  - [a0...Qm-i  = ak . . . am_i  a0  . . . ak_i] 

Qo am-  1 es„  0$k<m 

= Y Y [ao  . ..Qm_i=aic..  .am-iao..  .ak_i]  . 
0$k<m  ao,...,am_|6Sn 

Here  Sn  is  a set  of  n different  colors. 

Let’s  see  how  many  patterns  satisfy  Qo  . . . am-i=  . . . am_i  Qo  . . . Qk-i, 
when  k is  given.  For  example,  if  m = 12  and  k = 8,  we  want  to  count  the 
number  of  solutions  to 

ao ai  0-20-3 0-4 05060708 0901 oai  i = 08090100110001020304050607. 

This  means  qc  = q8  = a4;  Q]  = 09  = as;  02  — Oio  = ag;  and  q3  = Qn  = Q7. 
So  the  values  of  ao,  Qi,  02,  and  03  can  be  chosen  in  n4  ways,  and  the  remaining 
a’s  depend  on  them.  Does  this  look  familiar?  In  general,  the  solution  to 

Oj  — 0(j+k)modm  1 for  0 <m 

makes  us  equate  Oj  with  aq+ki)  mod  m f°r  1 = 1,  2,  . . , ; and  we  know  that 
the  multiples  of  k modulo  m are  {0,  d,  2d, . . . , m — d},  where  d = gcd(k,  m). 
Therefore  the  general  solution  is  to  choose  ao,  • • • , 0<j-i  independently  and 
then  to  set  Oj  = Qj-a  for  d ^ j < m.  There  are  nd  solutions. 
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We  have  just  proved  that 
iuN(m,n)  = Y_  ngcd(k'ml  . 

0^k<m 

This  sum  can  be  simplified,  since  it  includes  only  terms  nd  where  d\m.  Sub- 
stituting d = gcd(k,  m)  yields 

N(m,n)  = — Y_  n<i  X [d=gcd(k,m)j 
d\m  0$k<m 

= — ^ nd  Y_  [k/d  _L  m/d] 

^ d\m  0^k<m 

= nd  Z [klm/d]. 

d\m  0^k<m/d 

(We  are  allowed  to  replace  k/d  by  k because  k must  be  a multiple  of  d.) 
Finally,  we  have  Zio$k<m/d[k-L  tix/ d]  = cp(m/d]  by  definition,  so  we  obtain 
MacMahon’s  formula: 

N(m,n)  = ^J>dcp(^)  = l^cp(d)nm/d  (4.63) 

d\m  d\m 

When  m = 4 and  n = 2,  for  example,  the  number  of  necklaces  is  2 (1  -24  + 
1 -22  + 2.2’)  = 6,  just  as  we  suspected. 

It’s  not  immediately  obvious  that  the  value  N(m,  n)  defined  by  Mac- 
Mahon’s sum  is  an  integer!  Let’s  try  to  prove  directly  that 

Y_  cp(d)nm/d  = 0 (mod  m) , (4.64) 

d\m 

without  using  the  clue  that  this  is  related  to  necklaces.  In  the  special  case 
that  m is  prime,  this  congruence  reduces  to  np  + (p  — 1 )rt  = 0 (mod  p);  that 
is,  it  reduces  to  np  = n.  We’ve  seen  in  (4.48)  that  this  congruence  is  an 
alternative  form  of  Fermat’s  theorem.  Therefore  (4.64)  holds  when  m = p; 
we  can  regard  it  as  a generalization  of  Fermat’s  theorem  to  the  case  when  the 
modulus  is  not  prime.  (Euler’s  generalization  (4.50)  is  different.) 

We’ve  proved  (4.64)  for  all  prime  moduli,  so  let’s  look  at  the  smallest 
case  left,  m = 4.  We  must  prove  that 

n4  + n2  + 2n  = 0 (mod  4)  . 

The  proof  is  easy  if  we  consider  even  and  odd  cases  separately.  If  n is  even, 
all  three  terms  on  the  left  are  congruent  to  0 modulo  4,  so  their  sum  is  too.  If 
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n is  odd,  ri4  and  n2  are  each  congruent  to  1,  and  2n  is  congruent  to  2;  hence 
the  left  side  is  congruent  to  I +1+2  and  thus  to  0 modulo  4,  and  we’re  done. 

Next,  let’s  be  a bit  daring  and  try  m = 12.  This  value  of  m ought  to 
be  interesting  because  it  has  lots  of  factors,  including  the  square  of  a prime, 
yet  it  is  fairly  small.  (Also  there’s  a good  chance  we’ll  be  able  to  generalize  a 
proof  for  12  to  a proof  for  general  m.)  The  congruence  we  must  prove  is 

n1 2 + n6  + 2n4  + 2n3  + 2n2  + 4n  = 0 (mod  12). 

Now  what?  By  (4.42)  this  congruence  holds  if  and  only  if  it  also  holds  mod- 
ulo 3 and  modulo  4.  So  let’s  prove  that  it  holds  modulo  3.  Our  congru- 
ence (4.64)  holds  for  primes,  so  we  have  n3  + 2n  =0  (mod  3).  Careful 
scrutiny  reveals  that  we  can  use  this  fact  to  group  terms  of  the  larger  sum: 

u12  + n6  + 2n4  + 2n3  + 2n2  + 4 n 

= (n12  + 2n4)  + (n6  + 2n2)  + 2(n3  + 2rt) 

= 0 + 0 + 2-0  = 0 (mod  3). 


So  it  works  modulo  3. 

We’re  half  done.  To  prove  congruence  modulo  4 we  use  the  same  trick. 
We’ve  proved  that  n4  + n2  +2rt  = 0 (mod  4),  so  we  use  this  pattern  to  group: 

n12  + n6  + 2n4  + 2n3  + 2n2  + 4n 

= (n12  + n6  + 2n3)  + 2(n4  + n2  + 2n) 

= 0 + 2-0  = 0 (mod  4). 

QED  for  the  case  m = 12. 

So  far  we’ve  proved  our  congruence  for  prime  m,  for  m = 4,  and  for  m = 
12.  Now  let’s  try  to  prove  it  for  prime  powers.  For  concreteness  we  may 
suppose  that  m = p3  for  some  prime  p.  Then  the  left  side  of  (4.64)  is 

np3  + cp(p)np2  + tp(p2)np  + <p(p3)n 

= np3  + (p  - 1 )np2+ (p2  p)rtp+ (p3  - p2)n 
= (np3  - np2)  + p(np2  - np)  +p2(np  -n)  +p3n, 

We  can  show  that  this  is  congruent  to  0 modulo  p3  if  we  can  prove  that 
np  — np  is  divisible  by  p3,  that  np2  np  is  divisible  by  p2,  and  that  rtp  — n 
is  divisible  by  p,  because  the  whole  thing  will  then  be  divisible  by  p3.  By  the 
alternative  form  of  Fermat’s  theorem  we  have  np  = n (mod  p),  so  p divides 
ttp  — rt;  hence  there  is  an  integer  q such  that 


QED:  Quite  Easily 
Done. 


np  = n + pq 
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Now  we  raise  both  sides  to  the  pth  power,  expand  the  right  side  according  to 
the  binomial  theorem  (which  we’ll  meet  in  Chapter  5),  and  regroup,  giving 


np2  = (n  + pq)p  = np  + (pqjV-1  ^ + (pq)V  2 ^ 4 

= np  4 p2Q 

for  some  other  integer  Q.  We’re  able  to  pull  out  a factor  of  p2  here  because 
(p)  = p in  the  second  term,  and  because  a factor  of  (pq)2  appears  in  all  the 
terms  that  follow.  So  we  find  that  p2  divides  pP  — pP, 

Again  we  raise  both  sides  to  the  pth  power,  expand,  and  regroup,  to  get 

nP3  = (pP  + p2Q)p 

— np2  + (p2Q),pp(p  + (p2Q)2np(p-2^+  ■ ■ 

o1 

= np2  + p3Q 

for  yet  another  integer  Q.  So  p3  divides  np3  — np2.  This  finishes  the  proof  for 
m = p3,  because  we’ve  shown  that  p3  divides  the  left-hand  side  of  (4.64). 
Moreover  we  can  prove  by  induction  that 

ppk  = ppk_'  + pk£ 


for  some  final  integer  £J  (final  because  we’re  running  out  of  fonts);  hence 
npl<  = npl<  ' (modpk),  for  k > 0.  (4-65) 


Thus  the  left  side  of  (4.64),  which  is 


n 


p _np 


p(p 


pk  1 — npk  2 1 


4 pk  1(np  - rt)  4 pkn, 


is  divisible  by  pk  and  so  is  congruent  to  0 modulo  pk. 

We’re  almost  there.  Now  that  we’ve  proved  (4.64)  for  prime  powers,  all 
that  remains  is  to  prove  it  when  m = m’  m2,  where  m’  _L  m2,  assuming  that 
the  congruence  is  true  for  m’  and  m2.  Our  examination  of  the  case  m = 12, 
which  factored  into  instances  of  m = 3 and  m = 4,  encourages  us  to  think 
that  this  approach  will  work. 

We  know  that  the  cp  function  is  multiplicative,  so  we  can  write 

X <p(d)  nm/d  = Y_  (p(did2)pm,m2/d,d2 

d\m  d|\mi,d2\m.2 

= X <P(di)(  X <p(d2)(nm’/d'H/d2)  . 

d I \m I k d2\m2 
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But  the  inner  sum  is  congruent  to  0 modulo  m2,  because  we’ve  assumed  that 
(4.64)  holds  for  TTL2 ; so  the  entire  sum  is  congruent  to  0 modulo  m2-  By  a 
symmetric  argument,  we  find  that  the  entire  sum  is  congruent  to  0 modulo  TTLi 
as  well.  Thus  by  (4.42)  it’s  ‘congruent  to  0 modulo  m.  QED. 

Exercises 

Warmups 

1 What  is  the  smallest  positive  integer  that  has  exactly  k divisors,  for 

1 ^ k ^ 6? 

2 Prove  that  gcd(  m,  n)  . lcm(  m,  n)  = m-n,  and  use  this  identity  to  express 
lcm(m,n)  in  terms  of  lcm(n  mod  m,  m),  when  n mod  m ^ 0.  Hint: Use 
(4.12),  (4.14),  and  (4.15), 

3 Let  7t(x)  be  the  number  of  primes  not  exceeding  x.  Prove  or  disprove: 

n(x)  - 7t(x  - 1)  = [x  is  prime] 

4 What  would  happen  if  the  Stern-Brocot  construction  started  with  the 
five  fractions  (y , 3,  :ry,  -yp,  y)  instea^  of  with  (y,  5)? 

5 Find  simple  formulas  for  Lk  and  Rk,  when  L and  R are  the  2 x 2 matrices 

Of  (4.33). 

6 What  does  ‘a  = b (mod  0)’  mean? 

7 Ten  people  numbered  1 to  10  are  lined  up  in  a circle  as  in  the  Josephus 
problem,  and  every  mth  person  is  executed.  (The  value  of  m may  be 
much  larger  than  10.)  Prove  that  the  first  three  people  to  go  cannot  be 
10,  k,  and  k + 1 (in  this  order),  for  any  k. 

8 The  residue  number  system  (x  mod  3,  x mod  5)  considered  in  the  text  has 
the  curious  property  that  13  corresponds  to  (1,3),  which  looks  almost  the 
same.  Explain  how  to  find  all  instances  of  such  a coincidence,  without 
calculating  all  fifteen  pairs  of  residues.  In  other  words,  find  all  solutions 
to  the  congruences 

10x  + y - x (mod  3) , 10x  + -y  - y (mod5). 

Hint:  Use  the  facts  that  10u+6v  = u (mod  3)  and  10u+6v  = v (mod  5). 

9 Show  that  (377  — 1 )/2  is  odd  and  composite.  Hint:  What  is  377  mod  4? 

10  Compute  cp  (999) . 
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1 1 Find  a function  o(n)  with  the  property  that 

g(n)  = Y_  ^=>  f ( n ) = ^o(k)g(n-k). 

O^k^u  0$k$n 

(This  is  analogous  to  the  Mobius  function;  see  (4.56).) 

12  Simplify  the  formula  ^t(k)  g(d/k). 

13  A positive  integer  n is  called  squarefree  if  it  is  not  divisible  by  m2  for 

any  m > 1.  Find  a necessary  and  sufficient  condition  that  n is  squarefree, 
a in  terms  of  the  prime-exponent  representation  (4.11)  of  n; 
b in  terms  of  u(n). 

Basics 

14  Prove  or  disprove: 

a gcd(km,  kn)  = kgcd(m, n)  ; 
b lcm(km,  kn)  = klcm(m,n)  . 

15  Does  every  prime  occur  as  a factor  of  some  Euclid  number  e,? 

16  What  is  the  sum  of  the  reciprocals  of  the  first  n Euclid  numbers? 

17  Let  fn  be  the  “Fermat  number”  -)-  1.  Prove  that  f _|_  fn  if  m < n. 

18  Show  that  if  2”  + 1 is  prime  then  n is  a power  of  2. 

1 9 For  every  positive  integer  n there’s  a prime  p such  that  Tl  < p jC  2n.  (This 
is  essentially  “Bertrand’s  postulate,”  which  Joseph  Bertrand  verified  for 
n < 3000000  in  1845  and  Chebyshev  proved  for  all  tl  in  1850.)  Use 
Bertrand’s  postulate  to  prove  that  there’s  a constant  b a:  1.25  such  that 
the  numbers 

L2bj,  L22bj,  L222bj, . . . 

are  all  prime. 

20  Let  Pn  be  the  nth  prime  number.  Find  a constant  K such  that 

L(10n'l<)  mod  1 0nJ  = P,  . 

21  Prove  the  following  identities  when  Tl  is  a positive  integer: 


L 

1 $k<n 


<p(k+  1) 
k 


2 


1 <m^n  L X1  ^k<m 


Y_  [(m/k)/[m/klj 


n-l-£ 

k=1 


f ( k — 1) ! + 1 


Hint:  This  is  a trick  question  and  the  answer  is  pretty  easy. 
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22  The  number  1111111111111111111  is  prime.  Prove  that,  in  any  radix  b, 

(11  ...  1 )b  can  be  prime  only  if  the  number  of  1 's  is  prime. 

23  State  a recurrence  for  p(k),  the  ruler  function  in  the  text’s  discussion  of 
e2(n!).  Show  that  there’s  a connection  between  p(k)  and  the  disk  that’s 
moved  at  step  k when  an  n-disk  Tower  of  Hanoi  is  being  transferred  in 
2n  — 1 moves,  for  1 ^ k ^ 2n  — 1. 

24  Express  ep(n!)  in  terms  of  vp  (n),  the  sum  of  the  digits  in  the  radix  p 

representation  of  n,  thereby  generalizing  (4.24). 

25  We  say  that  m exactly  divides  n,  written  m\\n,  if  m\n  and  m J-  n/m. 
For  example,  in  the  text’s  discussion  of  factorial  factors,  pep,n!,\\n!. 
Prove  or  disprove  the  following: 

a k\\n  and  m\\n  <(==)>  km\\n,  if  k -L  m. 
b For  all  m,n  > 0,  either  gcd(m,  n)\\m  or  gcd(m,  n)\\n. 

26  Consider  the  sequence  3n  of  all  nonnegative  reduced  fractions  m/n  such 
that  mn  si  N For  example, 

G _ 0 1 111  11112121325345678910 

310  — 1 > 10>  9'  8'  7'  6>  5'  4'  3'  5'  2'  3'  1 > 2'  1 ' 2'  1 ’ 1 ' 1 ' 1 > 1 ' 1 ' 1 ’ 1 

Is  it  true  that  m’n  — mn’  = 1 whenever  m/n  immediately  precedes 
ra'/n'  in  SN? 

27  Give  a simple  rule  for  comparing  rational  numbers  based  on  their  repre- 
sentations as  F’s  and  R’s  in  the  Stern-Brocot  number  system. 

28  The  Stern-Brocot  representation  of  n is 

7t=  R3L7R15LR292LRLR2LR3LR14L2R..  . ; 

use  it  to  find  all  the  simplest  rational  approximations  to  7t  whose  denom- 
inators  are  less  than  50.  Is  — one  of  them? 

29  The  text  describes  a correspondence  between  binary  real  numbers  x = 

(.bib2bi  ...  )2  in  [0,  1)  and  Stern-Brocot  real  numbers  01  = B]  B2B3 ...  in 
[0,  00).  If  x corresponds  to  £X  and  x/  0,  what  number  corresponds  to 
I ■ ■ x ? 

30  Prove  the  following  statement  (the  Chinese  Remainder  Theorem):  Fet 

mi, ...  . mr  be  integers  with  m.j  -L  Ttlk  for  1 <C  j < k r;  let  m = 
m.)  ■ ■ ■ mr;  and  let  ai , . . . . ar,  A be  integers.  Then  there  is  exactly  one 
integer  a such  that 

a = ak  (mod  rrtijforl^k^r  and  A^a<A  + m. 


Is  this  a test  for 
strabismus? 


Look,  ma, 
sideways  addition. 


3 1  A number  in  decimal  notation  is  divisible  by  3 if  and  only  if  the  sum  of 
its  digits  is  divisible  by  3.  Prove  this  well-known  rule,  and  generalize  it. 
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Why  is  ‘'Euler” 
pronounced  "Oiler" 
when  “Euclid”  is 
“Yooklid”? 


32  Prove  Euler’s  theorem  (4.50)  by  generalizing  the  proof  of  (4.47). 

33  Show  that  if  f(m)  and  g(m)  are  multiplicative  functions,  then  so  is 
H(m)  = £dXm  f(d)  g(m/d). 

34  Prove  that  (4.56)  is  a special  case  of  (4.61). 

Homework  exercises 

35  Let  I(m,n)  be  a function  that  satisfies  the  relation 


I(m,n)m+  I(n,m)n  = gcd(m,n), 

when  m and  n are  nonnegative  integers  with  TTl  ^ n.  Thus,  I(  m,  n)  = m’ 
and  I(n,  m)  = n’  in  (4.5);  the  value  of  I(m,  n)  is  an  inverse  of  m with 
respect  to  n.  Find  a recurrence  that  defines  I(m,n). 

36  Consider  the  set  Z(vTO)  = {m  + nv/TO  integer  m,n}.  The  number 

m + n\/T0  is  called  a unit  if  rn2  — 1 On2  = ± 1 , since  it  has  an  inverse 
(that  is,  since  (m  + nvTO  ) • ±(m-n\/T0  ) = 1).  For  example,  3 + VTO  is 
a unit,  and  so  is  19  6VT0.  Pairs  of  cancelling  units  can  be  inserted  into 

any  factorization,  so  we  ignore  them.  Nonunit  numbers  of  Z('/lO  ) are 
called  prime  if  they  cannot  be  written  as  a product  of  two  nonunits.  Show 
that  2,  3,  and  4±  vTO  are  primes  of  Z(\/T0).  Hint:  If  2 = (k  + ly/TO ) x 
(m  + rt\/T0  ) then  4 = (k2  — 1 Ol2)  ( m2  — 1 On2).  Furthermore,  the  square 
of  any  integer  mod  10  is  0,  1,  4,  5,  6,  or  9. 

37  Prove  (4.17).  Hint:  Show  that  en  \=  (en_i  ^)2  + { , and  consider 

2 n log(en  — j). 

38  Prove  that  if  a J_  b and  a > b then 

gcd(am  — bm,  an  bn)  = agcd(m’ni  bgcd(m’ni  ,:  0 ^ m < n. 

(All  variables  are  integers.)  Hint:  Use  Euclid’s  algorithm. 

39  Let  S(m)  be  the  smallest  positive  integer  n for  which  there  exists  an 

increasing  sequence  of  integers 


m = Qi  < a2  < < at  = n 


such  that  ai  Q2  . . • Qt is  a perfect  square.  (If  m is  a perfect  square,  we 
can  let  t = 1 and  n = m.)  For  example,  S(2)  = 6 because  the  best  such 
sequence  is  2 • 3 • 6.  We  have 


n 

1 

2 

3 

4 

5 

6 

7 

8 9 

10 

11 

12 

S(n) 

1 

6 

8 

4 

10 

12 

14 

15  9 

18 

22 

20 

Prove  that  S(m)  ^ S (nT)  whenever  0 < m < m’. 
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40  If  the  radix  p representation  of  n is  (a,„  . . . dj  ao)p,  prove  that 

n!/pep(n!)  = (-l)‘:p,n!lam!.. . ai!ao!  (mod  p) 

(The  left  side  is  simply  n!  with  all  p factors  removed.  When  n = p this 
reduces  to  Wilson’s  theorem.) 

4 1 a Show  that  if  p mod.  4 = 3,  there  is  no  integer  n such  that  p divides 

n2  + 1 - Hint:  Use  Fermat’s  theorem, 
b But  show  that  if  p mod  4 = 1,  there  is  such  an  integer.  H int:  Write 
(P  — 1 )!  as  (nil!' 1/2  k(p  — k))  and  think  about  Wilson’s  theorem. 

4 2 Consider  two  fractions  m/n  and  m'/n'  in  lowest  terms.  Prove  that  when 
the  sum  m/n+m'/n'  is  reduced  to  lowest  terms,  the  denominator  will  be 
nn’  if  and  only  if  n _L  n’.  (In  other  words,  (mn'+Tn/uj/nri/  will  already 
be  in  lowest  terms  if  and  only  if  n and  n’  have  no  common  factor.) 

4 3 There  are  2k  nodes  at  level  k of  the  Stern-Brocot  tree,  corresponding  to 
the  matrices  [_k  Lk  1 R,  . Rk.  Show  that  this  sequence  can  be  obtained 
by  starting  with  Lk  and  then  multiplying  successively  by 

(l  2p(n)‘+  1 ) 

for  1 <?  n < 2k,  where  p(n)  is  the  ruler  function. 

44  Prove  that  a baseball  player  whose  batting  average  is  .316  must  have 
batted  at  least  19  times.  (If  he  has  m hits  in  n times  at  bat,  then 
m/n  € [.3155,  .3165).) 

4 5 The  number  9376  has  the  peculiar  self-reproducing  property  that 
93762  = 87909376 

How  many  4-digit  numbers  x satisfy  the  equation  x2  mod  10000  = x ? 
How  many  n-digit  numbers  x satisfy  the  equation  x2  mod  1 0n  = x ? 

4 6 a Prove  that  if  pi  = | and  pk  = 1 (mod  m),  then  rigcd,k k|  = 1. 

b Show  that  2n  ^ 1 (mod  n),  if  n > 1 . H int:  Consider  the  least  prime 

factor  of  n. 

47  Show  that  if  pm~'  = 1 (mod  m)  and  if  ^ i (mod  m)  for  all 

primes  such  that  p\(m  1),  then  m is  prime.  Hint:  Show  that  if  this 
condition  holds,  the  numbers  nk  mod  m are  distinct,  for  1 <?  k < m. 

4 8 Generalize  Wilson’s  theorem  (4.49)  by  ascertaining  the  value  of  the  ex- 
pression (rUn<m,nXmn)  m°d  m>  when  m > 1’ 


Wilson’s  theorem: 
“Martha,  that  boy 
is  a menace.” 


Radio  announcer: 

"...  pitcher  Mark 
LeCMre hits  a 

two-run  single! 

Mark  was  batting 
only  .080,  so  he  gets 
his  second  hit  of 
the  year,  ” 

Anything  wrong? 


The  proof  that  large 
numbers  are  prime 
is  very  easy:  Let 
x be  a large  prime 
number;  then  x is 
prime,  QED. 
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49  Let  R(N)  be  the  number  of  pairs  of  integers  (m,  n)  such  that  0 <C  m < N, 

0 ^ n < N , and  m ± n. 

a Express  R(N)  in  terms  of  the  $ function, 
b Prove  that  R(N)  = [N/dJ2p(d). 

50  Let  m be  a positive  integer  and  let 

c v = e2m/m  = cos(27r/m)  + isin(27i/m.) . 

What  are  the  roots  We  say  that  w is  an  mth  root  of  unity,  since  tum  = e2m  = 1.  In  fact, 

of  disunity?  each  Qf  m complex  numbers  iu°,  uj1  , . . , tl)m  1 is  an  mth  root  of 

unity,  because  (tuk)m  = e2,tk'  = 1;  therefore  z — o>k  is  a factor  of  the 
polynomial  zm  1,  for  0 <?  k < m.  Since  these  factors  are  distinct,  the 
complete  factorization  of  zm  — 1 over  the  complex  numbers  must  be 

z"M  = ]^[  (z-tuk), 

O^kcm 

a Let  ¥m(z)  = nos;k<m,k±m(z  - wk).  (This  polynomial  of  degree 
cp(m)  is  called  the  cyclotomic  polynomial  of  order  m.)  Prove  that 

zm-i  = A(z). 

d\m 

b Prove  that  Vm(z)  = nd\m(zd  1)^,m/d). 

Exam  problems 

51  Prove  Fermat’s  theorem  (4.48)  by  expanding  (1  -f-  1 + . . , -)-  1 )p  via  the 

multinomial  theorem. 

52  Let  n and  x be  positive  integers  such  that  x has  no  divisors  <C  n (except  1 ), 
and  let  p be  a prime  number.  Prove  that  at  least  [rt/pj  of  the  numbers 
{x  — 1 , x2  — 1 , . . . , xn_1  — 1 } are  multiples  of  p. 

53  Find  all  positive  integers  n such  that  n \ [(n  — l)!/(n  + 1)J. 

54  Determine  the  value  of  1000!  mod  10250  by  hand  calculation. 

55  Let  Pn  be  the  product  of  the  first  n factorials,  J"J£  k!.  Prove  that 

P2n/P^  is  an  integer,  for  all  positive  integers  n. 

56  Show  that 


is  a power  of  2. 
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57  Let  S(m,n)  be  the  set  of  all  integers  k such  that 
mmodk  + nmodk  > k. 


For  example,  S(7,9) 
<p(k)  = 


L 


{2,4,5,8,10,11,12,13,14,15,16}.  Prove  that 


mn 


JcGSfTr^n) 

Hint:  Prove  first  that  £1$m<;nLd\m  <P(d)  = Ld:s1  Ln/dJ-  Then 
consider  [(tti  + n)/dj  |_m/dj  [n/dJ. 

58  Let  f(mj  = m d.  Find  a necessary  and  sufficient  condition  that  f(m) 
is  a power  of  2. 

Bonus  problems 

59  Prove  that  if  xi , . . . , xn  are  positive  integers  with  1 /xi  + ■ • • + 1 /xn  = 1 , 

then  max(xi,. . . ,xn)  < e,.  Hint:  Prove  the  following  stronger  result  by 
induction:  “If  1 /xi  + • . . 1 /xn  + 1 /(X  = 1 , where  xi , . . . , Xn  are  positive 

integers  and  (X  is  a rational  number  max(xi  , . . , xn),  then  a+  1 <}  en+i 
and  Xi  . Xn  (a  + 1)  ei  . . . enen+i  •”  (The  proof  is  nontrivial.) 

60  Prove  that  there’s  a constant  P such  that  (4.18)  gives  only  primes.  You 

may  use  the  following  (highly  nontrivial)  fact:  There  is  a prime  between 

p and  p + cp®,  for  some  constant  c and  all  sufficiently  large  p,  where 

n _ 1051 
D ~ w 

61  Prove  that  if  m/n,  m//ri/,  and  m" /n"  are  consecutive  elements  of  ITn, 
then 


m ’’  = [(n  + Nj/n'Jm/ 
n”  = [(n  + NJ/n/Jn' 


- m , 
■ n . 


(This  recurrence  allows  us  to  compute  the  elements  of  in  order,  start- 
ing with  y and  j^.) 

62  What  binary  number  corresponds  to  e,  in  the  binary  f-)  Stem-Brocot 

correspondence?  (Express  your  answer  as  an  infinite  sum;  you  need  not 
evaluate  it  in  closed  form.) 

63  Show  that  if  Fermat’s  Last  Theorem  (4.46)  is  false,  the  least  n for  which 
it  fails  is  prime.  (You  may  assume  that  the  result  holds  when  n = 4.) 
Furthermore,  if  ap  + bp  = cp  and  alb,  show  that  there  exists  an  integer 
nr  such  that 

mp,  if  p\c; 


a + b 


pp  1mp  if  p\c. 


Thus  c must  be  really  huge.  Hint:  Let  x 
gcd(x,  (qp  + (x  — q)p)/x)  = gcd(x,pap_1 ). 


= a + b,  and  note  that 
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64  The  Peirce  sequence  Pn  of  order  N is  an  infinite  string  of  fractions 
separated  by  ‘<’  or  '=’  signs,  containing  all  the  nonnegative  fractions 
rn/n  with  in  j>  0 and  n <C  N (including  fractions  that  are  not  reduced). 
It  is  defined  recursively  by  starting  with 


?!  - 


For  N 1,  we  form  Pn+i  by  inserting  two  symbols  just  before  the  kNth 
symbol  of  Pm,  for  all  k > 0.  The  two  inserted  symbols  are 

k - I 

^ ^ = , if  kN  is  odd; 

k - 1 

Pm.IcN  -jq-pj  , if  kN  is  even. 

Here  ?nj  denotes  the  jth  symbol  of  Pn,  which  will  be  either  '<’  or  ' — 
when  j is  even;  it  will  be  a fraction  when  j is  odd.  For  example, 


?2  = 
P 3 = 
P4  = 
P5  = 
P6  = 


0_0^l^2_  1^3-4_2,5-6_  3,7,8_4,t>_IO_5^ 

2 1 ^-2^2  1 1 ^2^2  t *^2*^2  t 2 ~ 1 ^ ' 


0_ 0_ °^1^I^2,2_ 3 
2 3 T^3^2<^3<^2  3 


1^4.3,5^4_6_2,7,5, 
‘1^3^2^3^2  — 3 — 1 ^3^2^ ' 


0_0_0_0  .1  .1  ,2_1  .2  ,3  ,2_4_3_1  .5  .4  .6_ 
2— 4“ 3_1 ^4^3^4— 2^3^4^2~4“3~1 ^4^3^4— ' 

0_ 0_ 0_ 0_ 0^1^1^1^2,2_ ]^2^2,3^4^2_4_ 
2~  4 — 5~  3~  1 ^5^  4^3  ^5^4“  2 ^5  ^3  ^41  ~ 4)  “ ' 

0_0_0_0_0_0^,1  .2_1  .2  .2_3_1  ,3  .4_ 

2 4 6 5 3 1 ^ <s>  S^'4^6~3^5<'4  6 2^5  ’ 


(Equal  elements  occur  in  a slightly  peculiar  order.)  Prove  that  the  “<’ 
and  ‘=’  signs  defined  by  the  rules  above  correctly  describe  the  relations 
between  adjacent  fractions  in  the  Peirce  sequence. 

Research  problems 

65  Are  the  Euclid  numbers  en  all  squarefree? 

66  Are  the  Mersenne  numbers  2P  — 1 all  squarefree? 

67  Prove  or  disprove  that  maxi^i^n  Qk/gcd( a.j , ok)  7>  n,  for  all  sequences 
of  integers  0 < Qi  < • > > < a,. 

68  Is  there  a constant  Q such  that  |_Q2  J is  prime  for  all  n ^ O? 

69  Let  Pn  denote  the  nth  prime.  Prove  or  disprove  that  Pn+i  Pn  = 

0(logPn)2. 

70  Does  €3 (tl! ) = e2(n!)/2  for  infinitely  many  n? 

71  Prove  or  disprove:  If  k 7^  1 there  exists  n > 1 such  that  2”  = k (mod  n). 
Are  there  infinitely  many  such  n? 

72  Prove  or  disprove:  For  all  integers  a,  there  exist  infinitely  many  n such 

that  cp(n)\(n  + a). 
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73  If  the  d)(n)  + 1 terms  of  the  Farey  series 

= (^n(0),?n(1),...,^n(a)(n))) 

were  fairly  evenly  distributed,  we  would  expect  3“n  (k) « k/@(n).  There- 
fore the  sum  D(n)  = — k/0(n)|  measures  the  “deviation 

of  Tn  from  uniformity!’  Is  it  true  that  D(n)  = 0 (n1/,2+e)  for  all  e > O? 

74  Approximately  how  many  distinct  values  are  there  in  the  set  {O!  mod  p, 

1 ! mod  p, . . . , (p  - 1 )!  mod  p},  as  p — > oo? 


5 


Lucky  us! 


Otherwise  known 
as  combinations  of 
n things,  k at  a 
time. 


Binomial  Coefficients 


LET'S  TAKE  A BREATHER.  The  previous  chapters  have  seen  some  heavy 
going,  with  sums  involving  floor,  ceiling,  mod,  phi,  and  mu  functions.  Now 
we’re  going  to  study  binomial  coefficients,  which  turn  out  to  be  (a)  more 
important  in  applications,  and  (b)  easier  to  manipulate,  than  all  those  other 
quantities. 


5.1  BASIC  IDENTITIES 

The  symbol  Q)  is  a binomial  coefficient,  so  called  because  of  an  im- 
portant property  we  look  at  later  this  section,  the  binomial  theorem.  But  we 
read  the  symbol  “n  choose  k.”  This  incantation  arises  from  its  combinatorial 
interpretation-it  is  the  number  of  ways  to  choose  a k-element  subset  from 
an  n-element  set.  For  example,  from  the  set  {1,2, 3, 4}  we  can  choose  two 
elements  in  six  ways, 

{1,2},  {1,3},  {1,4},  {2,3},  {2,4},  {3,4}; 

so  Q = 6. 

To  express  the  number  (£)  in  more  familiar  terms  it’s  easiest  to  first 
determine  the  number  of  k-element  sequences,  rather  than  subsets,  chosen 
from  an  n-element  set;  for  sequences,  the  order  of  the  elements  counts.  We 
use  the  same  argument  we  used  in  Chapter  4 to  show  that  n!  is  the  number 
of  permutations  of  n objects.  There  are  n choices  for  the  first  element  of  the 
sequence;  for  each,  there  are  n-1  choices  for  the  second;  and  so  on,  until  there 
are  n— k+1  choices  for  the  kth.  This  gives  n(n-l).  . . (n— k+1]=  rt-  choices 
in  all.  And  since  each  k-element  subset  has  exactly  k!  different  orderings,  this 
number  of  sequences  counts  each  subset  exactly  k!  times.  To  get  our  answer, 
we  simply  divide  by  k!: 

n _ n(n-1)...(n-k+1) 
ok  “ k(k-l)...(l)  ■ 
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For  example, 

this  agrees  with  our  previous  enumeration. 

We  call  n the  upper  index  and  k the  lower  index.  The  indices  are 
restricted  to  be  nonnegative  integers  by  the  combinatorial  interpretation,  be- 
cause sets  don’t  have  negative  or  fractional  numbers  of  elements.  But  the 
binomial  coefficient  has  many  uses  besides  its  combinatorial  interpretation, 
so  we  will  remove  some  of  the  restrictions.  It’s  most  useful,  it  turns  out, 
to  allow  an  arbitrary  real  (or  even  complex)  number  to  appear  in  the  upper 
index,  and  to  allow  an  arbitrary  integer  in  the  lower.  Our  formal  definition 
therefore  takes  the  following  form: 


f r(r  — 1)...(r  — k+1) 

J k(k-l)...(l) = 

lo, 


r- 

— , integer  k ^ 0; 
integer  k < 0. 


(5-1) 


This  definition  has  several  noteworthy  features.  First,  the  upper  index  is 
called  r,  not  n;  the  letter  T emphasizes  the  fact  that  binomial  coefficients  make 
sense  when  any  real  number  appears  in  this  position.  For  instance,  we  have 
( = (— 1 )(—  2)(—  3)/(3 - 2 ■ 1 ) = -1,  There’s  no  combinatorial  interpretation 

here,  but  r = -1  turns  out  to  be  an  important  special  case.  A noninteger 
index  like  r = —1/2  also  turns  out  to  be  useful. 

Second,  we  can  view  (£)  as  a kth-degree  polynomial  in  r.  We’ll  see  that 
this  viewpoint  is  often  helpful. 

Third,  we  haven’t  defined  binomial  coefficients  for  noninteger  lower  in- 
dices. A reasonable  definition  can  be  given,  but  actual  applications  are  rare, 
so  we  will  defer  this  generalization  to  later  in  the  chapter. 

Final  note:  We’ve  listed  the  restrictions  ‘integer  k 0’  and  ‘integer 
k < 0’  at  the  right  of  the  definition.  Such  restrictions  will  be  listed  in  all 
the  identities  we  will  study,  so  that  the  range  of  applicability  will  be  clear. 
In  general  the  fewer  restrictions  the  better,  because  an  unrestricted  identity 
is  most  useful;  still,  any  restrictions  that  apply  are  an  important  part  of 
the  identity.  When  we  manipulate  binomial  coefficients,  it’s  easier  to  ignore 
difficult-to-remember  restrictions  temporarily  and  to  check  later  that  nothing 
has  been  violated.  But  the  check  needs  to  be  made. 

For  example,  almost  every  time  we  encounter  (™)  it  equals  1,  so  we  can 
get  lulled  into  thinking  that  it’s  always  1.  But  a careful  look  at  definition  (5.1) 
tells  us  that  Q)  is  1 only  when  n ^ 0 (assuming  that  n is  an  integer);  when 
n < 0 we  have  (")  = 0.  Traps  like  this  can  (and  will)  make  life  adventuresome. 
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Binomial  coefficients 
were  well  known 
in  Asia,  many  cen- 
turies before  Pascal 
was  bom  [ 74 ],  but 
he  bad  no  way  to 
know  that. 


In  Italy  it’s  called 
Tartaglia’s  triangle. 


Before  getting  to  the  identities  that  we  will  use  to  tame  binomial  coeffi- 
cients, let’s  take  a peek  at  some  small  values.  The  numbers  in  Table  155  form 
the  beginning  of  Pascal's  triangle,  named  after  Blaise  Pascal  (1623-1662) 


because  he  wrote  an  influential  treatise  about  them  [227],  The  empty  entries 
in  this  table  are  actually  O’s,  because  of  a zero  in  the  numerator  of  (5.1);  for 
example,  Q)  = ( 1 -0)/(2- 1 ) =0.  These  entries  have  been  left  blank  simply  to 
help  emphasize  the  rest  of  the  table. 

It’s  worthwhile  to  memorize  formulas  for  the  first  three  columns, 


G)  = ^ 


(5-2) 


these  hold  for  arbitrary  reals.  (Recall  that  ) = |n(n  + 1)  is  the  formula 
we  derived  for  triangular  numbers  in  Chapter  1;  triangular  numbers  are  con- 
spicuously present  in  the  Q)  column  of  Table  155.)  It’s  also  a good  idea  to 
memorize  the  first  five  rows  or  so  of  Pascal’s  triangle,  so  that  when  the  pat- 
tern 1 , 4,  6,  4,  1 appears  in  some  problem  we  will  have  a clue  that  binomial 
coefficients  probably  lurk  nearby. 

The  numbers  in  Pascal’s  triangle  satisfy,  practically  speaking,  infinitely 
many  identities,  so  it’s  not  too  surprising  that  we  can  find  some  surprising 
relationships  by  looking  closely.  For  example,  there’s  a curious  “hexagon 
property,”  illustrated  by  the  six  numbers  56,  28,  36,  120,  210,  126  that  sur- 
round 84  in  the  lower  right  portion  of  Table  155.  Both  ways  of  multiplying 
alternate  numbers  from  this  hexagon  give  the  same  product:  56-36-210  = 
28- 120- 126  = 423360.  The  same  thing  holds  if  we  extract  such  a hexagon 
from  any  other  part  of  Pascal’s  triangle. 
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And  now  the  identities,.  Our  goal  in  this  section  will  be  to  learn  a few 
simple  rules  by  which  we  can  solve  the  vast  majority  of  practical  problems 
involving  binomial  coefficients. 

Definition  (5.1)  can  be  recast  in  terms  of  factorials  in  the  common  case 
that  the  upper  index  r is  an  integer,  n,  that’s  greater  than  or  equal  to  the 
lower  index  k: 


“C’est  une  chose 
estrange  corabien 
j'J  est  fertile  en 
proprietez.  P 

— B.  Pascal  {227 j 


n! 


k ! ( n-k) ! ’ 


integers  n j>  k j>  0. 


(5,3) 


To  get  this  formula,  we  just  multiply  the  numerator  and  denominator  of  (5.1) 
by  (n  k)!.  It’s  occasionally  useful  to  expand  a binomial  coefficient  into  this 

factorial  form  (for  example,  when  proving  the  hexagon  property).  And  we 
often  want  to  go  the  other  way,  changing  factorials  into  binomials. 

The  factorial  representation  hints  at  a symmetry  in  Pascal’s  triangle: 
Each  row  reads  the  same  left-to-right  as  right-to-left.  The  identity  reflecting 
this-called  the  symmetry  identity-is  obtained  by  changing  k to  n k: 


fn\  __  f \ integer  n ^ 0, 
\ky  \n  — k)  ’ integer  k. 


This  formula  makes  combinatorial  sense,  because  by  specifying  the  k chosen 
things  out  of  n we’re  in  effect  specifying  the  n — k unchosen  things. 

The  restriction  that  Tl  and  k be  integers  in  identity  (5.4)  is  obvious,  since 
each  lower  index  must  be  an  integer.  But  why  can’t  rt  be  negative?  Suppose, 
for  example,  that  n = -1.  Is 


-1 

-1  -k 


a valid  equation?  No.  For  instance,  when  k = 0 we  get  1 on  the  left  and  0 on 
the  right.  In  fact,  for  any  integer  k j>  0 the  left  side  is 

(-1  )(-2) . . , l-k)  k 

k!  1 

which  is  either  1 or  -1;  but  the  right  side  is  0,  because  the  lower  index  is 
negative.  And  for  negative  k the  left  side  is  0 but  the  right  side  is 


which  is  either  1 or  -1.  So  the  equation  ‘(j^)  = ( ’ is  always  false! 

The  symmetry  identity  fails  for  all  other  negative  integers  n,  too.  But 
unfortunately  it’s  all  too  easy  to  forget  this  restriction,  since  the  expression 
in  the  upper  index  is  sometimes  negative  only  for  obscure  (but  legal)  values 
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I just  hope  / don't 

fall  into  this  trap 

during  the  midterm. 


of  its  variables.  Everyone  who’s  manipulated  binomial  coefficients  much  has 
fallen  into  this  trap  at  least  three  times. 

But  the  symmetry  identity  does  have  a big  redeeming  feature:  It  works 
for  all  values  of  k,  even  when  k < 0 or  k > n.  (Because  both  sides  are  zero  in 
such  cases.)  Otherwise  0 ^ k ^ n,  and  symmetry  follows  immediately  from 
(5-3): 


n 

0k 


n! 


nl 


k ! (n-k) ! (n  - (n  - k))!  ( n - k ) ! 


n 
n-k 


Our  next  important  identity  lets  us  move  things  in  and  out  of  binomial 
coefficients: 


r /r-  1 
k \k  — 1 


integer  k / 0. 


(5-5) 


The  restriction  on  k prevents  us  from  dividing  by  0 here.  We  call  (5.5) 
an  absorption  identity,  because  we  often  use  it  to  absorb  a variable  into  a 
binomial  coefficient  when  that  variable  is  a nuisance  outside.  The  equation 
follows  from  definition  (5.1), because  r-  = r(r—  1 ) k 1 and  k!  = k(k- 1)!  when 
k > 0;  both  sides  are  zero  when  k < 0. 

If  we  multiply  both  sides  of  (5.5)  by  k,  we  get  an  absorption  identity  that 
works  even  when  k = 0: 

integer  k.  (5,6) 


This  one  also  has  a companion  that  keeps  the  lower  index  intact: 


integer  k. 


(57) 


We  can  derive  (5.7)  by  sandwiching  an  application  of  (5.6)  between  two  ap- 
plications of  symmetry: 


(r-k) 


- ri- 


le) 


r 

r-k 


(by  symmetry) 


(by  (5-6)) 

(by  symmetry) 


But  wait  a minute.  We’ve  claimed  that  the  identity  holds  for  all  real  r, 
yet  the  derivation  we  just  gave  holds  only  when  r is  a positive  integer.  (The 
upper  index  r — 1 must  be  a nonnegative  integer  if  we’re  to  use  the  symmetry 
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property  (5.4)  with  impunity.)  Have  we  been  cheating?  No.  It’s  true  that  (Well,  not  here 

the  derivation  is  valid  only  for  positive  integers  r;  but  we  can  claim  that  the  anyway.) 

identity  holds  for  all  values  of  r,  because  both  sides  of  (5.7)  are  polynomials 

in  r of  degree  k + 1.  A nonzero  polynomial  of  degree  d or  less  can  have  at 

most  d distinct  zeros;  therefore  the  difference  of  two  such  polynomials,  which 

also  has  degree  d or  less,  cannot  be  zero  at  more  than  d points  unless  it  is 

identically  zero.  In  other  words,  if  two  polynomials  of  degree  d or  less  agree 

at  more  than  d points,  they  must  agree  everywhere.  We  have  shown  that 

( r - k ) ( ; ) = r(rk’ Whenever  r is  a positive  integer;  so  these  two  polynomials 

agree  at  infinitely  many  points,  and  they  must  be  identically  equal. 

The  proof  technique  in  the  previous  paragraph,  which  we  will  call  the 
polynomial  argument,  is  useful  for  extending  many  identities  from  integers 
to  reals;  we’ll  see  it  again  and  again.  Some  equations,  like  the  symmetry 
identity  (5.4),  are  not  identities  between  polynomials,  so  we  can’t  always  use 
this  method.  But  many  identities  do  have  the  necessary  form. 

For  example,  here’s  another  polynomial  identity,  perhaps  the  most  im- 
portant binomial  identity  of  all,  known  as  the  addition  formula: 

(k)  = (V)  + (;-l  :)  - integerk-  (5-8) 


When  r is  a positive  integer,  the  addition  formula  tells  us  that  every  number 
in  Pascal’s  triangle  is  the  sum  of  two  numbers  in  the  previous  row,  one  directly 
above  it  and  the  other  just  to  the  left.  And  the  formula  applies  also  when  r 
is  negative,  real,  or  complex;  the  only  restriction  is  that  k be  an  integer,  so 
that  the  binomial  coefficients  are  defined. 

One  way  to  prove  the  addition  formula  is  to  assume  that  r is  a positive 
integer  and  to  use  the  combinatorial  interpretation.  Recall  that  (k)  is  the 
number  of  possible  k-element  subsets  chosen  from  an  r-element  set.  If  we 
have  a set  of  r eggs  that  includes  exactly  one  bad  egg,  there  are  (k)  ways  to 
select  k of  the  eggs.  Exactly  C^1)  of  these  selections  involve  nothing  but  good 
eggs;  and  (k_|)  of  them  contain  the  bad  egg,  because  such  selections  have  k-1 
of  the  T — 1 good  eggs.  Adding  these  two  numbers  together  gives  (5.8).  This 
derivation  assumes  that  r is  a positive  integer,  and  that  k 1>  0.  But  both  sides 
of  the  identity  are  zero  when  k < 0,  and  the  polynomial  argument  establishes 
(5.8)  in  all  remaining  cases. 

We  can  also  derive  (5.8)  by  adding  together  the  two  absorption  identities 
(5.7)  and  (5.6): 


+ r 


the  left  side  is  r(k),  and  we  can  divide  through  by  r.  This  derivation  is  valid 
for  everything  but  r = 0,  and  it’s  easy  to  check  that  remaining  case. 
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Those  of  us  who  tend  not  to  discover  such  slick  proofs,  or  who  are  oth- 
erwise into  tedium,  might  prefer  to  derive  (5.8)  by  a straightforward  manip- 
ulation of  the  definition.  If  k > 0, 

/r— 1\  /r  — 1 \ ( r - , 1]k  (r-  1)^1 

V k ) + [k-  V k!  + (k-  1)! 

(r  — 1 )^1  (r  — k)  + (r-1)— k 
“ k!  k! 

(r  — 1)^r  _ r-  _ r 
k!  k!  0^ 

Again,  the  cases  for  k <C  0 are  easy  to  handle. 

We’ve  just  seen  three  rather  different  proofs  of  the  addition  formula.  This 
is  not  surprising;  binomial  coefficients  have  many  useful  properties,  several  of 
which  are  bound  to  lead  to  proofs  of  an  identity  at  hand. 

The  addition  formula  is  essentially  a recurrence  for  the  numbers  of  Pas- 
cal’s triangle,  so  we’ll  see  that  it  is  especially  useful  for  proving  other  identities 
by  induction.  We  can  also  get  a new  identity  immediately  by  unfolding  the 
recurrence.  For  example, 


Since  ('J  = 0,  that  term  disappears  and  we  can  stop.  This  method  yields 
the  general  formula 


r + n + 1 
rt 


integer  n. 


(5-9) 


Notice  that  we  don’t  need  the  lower  limit  k ^ 0 on  the  index  of  summation, 
because  the  terms  with  k < 0 are  zero. 

This  formula  expresses  one  binomial  coefficient  as  the  sum  of  others  whose 
upper  and  lower  indices  stay  the  same  distance  apart.  We  found  it  by  repeat- 
edly expanding  the  binomial  coefficient  with  the  smallest  lower  index:  first 
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0,  then  (2),  then  (^),  then  (g) . What  happens  if  we  unfold  the  other  way, 
repeatedly  expanding  the  one  with  largest  lower  index?  We  get 


Now  is  zero  (so  are  (,)  and  (2)  ■ but  these  make  the  identity  nicer),  and 
we  can  spot  the  general  pattern: 


(5-io) 


This  identity,  which  we  call  summation  on  the  upper  index,  expresses  a 
binomial  coefficient  as  the  sum  of  others  whose  lower  indices  are  constant.  In 
this  case  the  sum  needs  the  lower  limit  k 0,  because  the  terms  with  k < 0 
aren’t  zero.  Also,  m and  n can’t  in  general  be  negative. 

Identity  (5.10)  has  an  interesting  combinatorial  interpretation.  If  we  want 
to  choose  m + 1 tickets  from  a set  of  n + 1 tickets  numbered  0 through  n, 
there  are  ways  to  do  this  when  the  largest  ticket  selected  is  number  k. 

We  can  prove  both  (5.9)  and  (5.10)  by  induction  using  the  addition 
formula,  but  we  can  also  prove  them  from  each  other.  For  example,  let’s 
prove  (5.9)  from  (5.10);  our  proof  will  illustrate  some  common  binomial  co- 
efficient manipulations.  Our  general  plan  will  be  to  massage  the  left  side 
Y_  (r;k)  of  (5.9)  so  that  it  looks  like  the  left  side  Y,  (£j  °f  (5-10);  then  we’ll 
invoke  that  identity,  replacing  the  sum  by  a single  binomial  coefficient;  finally 
we’ll  transform  that  coefficient  into  the  right  side  of  (5.9). 

We  can  assume  for  convenience  that  r and  n are  nonnegative  integers; 
the  general  case  of  (5.9)  follows  from  this  special  case,  by  the  polynomial 
argument.  Let’s  write  m instead  of  r,  so  that  this  variable  looks  more  like 
a nonnegative  integer.  The  plan  can  now  be  carried  out  systematically  as 
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follows: 


Z 


O^k^m+n 


/m  + n + 1\ 

V m+1  ) 


( 


m + n + 1 

n 


Let’s  look  at  this  derivation  blow  by  blow.  The  key  step  is  in  the  second  line, 
where  we  apply  the  symmetry  law  (5.4)  to  replace  (m|^k)  by  (m^k)-  We’re 
allowed  to  do  this  only  when  m + k j>  0,  so  our  first  step  restricts  the  range 
of  k by  discarding  the  terms  with  k < -m.  (This  is  legal  because  those  terms 
are  zero.)  Now  we’re  almost  ready  to  apply  (5. 10) ; the  third  line  sets  this  up, 
replacing  k by  k m and  tidying  up  the  range  of  summation.  This  step,  like 
the  first,  merely  plays  around  with  t-notation.  Now  k appears  by  itself  in 
the  upper  index  and  the  limits  of  summation  are  in  the  proper  form,  so  the 
fourth  line  applies  (5.10).  One  more  use  of  symmetry  finishes  the  job. 

Certain  sums  that  we  did  in  Chapters  1 and  2 were  actually  special  cases 
of  (5.10),  or  disguised  versions  of  this  identity.  For  example,  the  case  m = 1 
gives  the  sum  of  the  nonnegative  integers  up  through  n: 


0 + 1 


+ •••+71. 


(n  + 1 )n 
2 


And  the  general  case  is  equivalent  to  Chapter  2’s  rule 
(n  + l)™±l 


Y_  k- 

O^k^n 


TTI  + 


integers  m,  n 0, 


if  we  divide  both  sides  of  this  formula  by  m!.  In  fact,  the  addition  formula 
(5.8)  tells  us  that 


A 


x 

m—  1 


if  we  replace  r and  k respectively  by  x + 1 and  m.  Hence  the  methods  of 
Chapter  2 give  us  the  handy  indefinite  summation  formula 
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Binomial  coefficients  get  their  name  from  the  binomial  theorem,  which 
deals  with  powers  of  the  binomial  expression  x + y.  Let’s  look  at  the  smallest 
cases  of  this  theorem: 

(x  + y)°  = ixy 

(x  + y)1  = 1x'y°  + Ixy 

(x  + y)2  = lx2y°+2x1y1  +lx°y2 

(x  + y)3  = lx3y°  +3x2y1  +3x’y2  + lx°y3 

(x  + y)4  = 1x4y°  +4x3y1  +6x2y2  + 4x'y3  +lx°y4. 

It’s  not  hard  to  see  why  these  coefficients  are  the  same  as  the  numbers  in 
Pascal’s  triangle:  When  we  expand  the  product 

n factors 

(x  + y)u  = (x  + y)(x  + y) ...  (x  + y) , 

every  term  is  itself  the  product  of  n factors,  each  either  an  x or  y.  The  number 
of  such  terms  with  k factors  of  x and  n k factors  of  y is  the  coefficient 
of  xkyn'k  after  we  combine  like  terms.  And  this  is  exactly  the  number  of 
ways  to  choose  k of  the  n binomials  from  which  an  x will  be  contributed;  that 
is,  it’s  (£). 

Some  textbooks  leave  the  quantity  0°  undefined,  because  the  functions 
X°  and  0X  have  different  limiting  values  when  x decreases  to  0.  But  this  is  a 
mistake.  We  must  define 

x°  = 1,  for  all  x, 

if  the  binomial  theorem  is  to  be  valid  when  x = 0,  y = 0,  and/or  x = — y. 
The  theorem  is  too  important  to  be  arbitrarily  restricted!  By  contrast,  the 
function  0X  is  quite  unimportant. 

But  what  exactly  is  the  binomial  theorem?  In  its  full  glory  it  is  the 
following  identity: 


(x  + y)’ 


integer  t 0 
or  |x/yj  < 1, 


(5-12) 


The  sum  is  over  all  integers  k;  but  it  is  really  a finite  sum  when  r is  a nonneg- 
ative integer,  because  all  terms  are  zero  except  those  with  0 <C  k <C  r.  On  the 
other  hand,  the  theorem  is  also  valid  when  r is  negative,  or  even  when  r is 
an  arbitrary  real  or  complex  number.  In  such  cases  the  sum  really  is  infinite, 
and  we  must  have  |x/yi<  1 t o guarantee  the  sum’s  absolute  convergence. 


“At  the  age 
of  twenty-one 
he  [Moriarty]  wrote 
a treatise  upon  the 
Binomial  Theorem, 
which  has  had  a Eu- 
ropean vogue.  On 
the  strength  of  it, 
he  won  the  Math- 
ematical Chair  at 
one  of  our  smaller 
Universities.” 

— S.  Holmes  [71] 
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(Chapter  9 tells  the 

meaning  of  0 .) 


Two  special  cases  of  the  binomial  theorem  are  worth  special  attention, 
even  though  they  are  extremely  simple.  If  x = y = 1 and  r = n is  nonnegative, 
we  get 


integer  n 0. 


This  equation  tells  us  that  row  n of  Pascal’s  triangle  sums  to  2n.  And  when 
x is  -1  instead  of  +1,  we  get 


0" 


+ ---  + (-1)n 


integer  n ^ 0. 


For  example,  1 — 4 + 6—  4 + 1=0;  the  elements  of  row  n sum  to  zero  if  we 
give  them  alternating  signs,  except  in  the  top  row  (when  n = 0 and  0°  = 1 ) ■ 
When  T is  not  a nonnegative  integer,  we  most  often  use  the  binomial 
theorem  in  the  special  case  y = 1,  Let’s  state  this  special  case  explicitly, 
writing  z instead  of  x to  emphasize  the  fact  that  an  arbitrary  complex  number 
can  be  involved  here: 


\z\  < 1. 


(5-13) 


The  general  formula  in  (5.12)  follows  from  this  one  if  we  set  z = x/y  and 
multiply  both  sides  by  y’. 

We  have  proved  the  binomial  theorem  only  when  r is  a nonnegative  in- 
teger, by  using  a combinatorial  interpretation.  We  can’t  deduce  the  general 
case  from  the  nonnegative-integer  case  by  using  the  polynomial  argument, 
because  the  sum  is  infinite  in  the  general  case.  But  when  r is  arbitrary,  we 
can  use  Taylor  series  and  the  theory  of  complex  variables: 


M m nu, 

0!  1!  2! 


L 


f(k)(0).k 

k! 


The  derivatives  of  the  function  f ( z ) = ( 1 + z)r  are  easily  evaluated;  in  fact, 

f(k'(z)  =r-  (1  + z)r~k.  Setting  z = 0 gives  (5.13). 

We  also  need  to  prove  that  the  infinite  sum  converges,  when  |z|  < 1 . It 
does,  because  (£)  = 0(k~1~r)  by  equation  (5.83)  below. 

Now  let’s  look  more  closely  at  the  values  of  when  n is  a negative 
integer.  One  way  to  approach  these  values  is  to  use  the  addition  law  (5.8)  to 
fill  in  the  entries  that  lie  above  the  numbers  in  Table  155,  thereby  obtaining 
Table  164.  For  example,  we  must  have  = 1,  since  (q)  = (^1)+  (l])  and 
(^J)  = 0;  then  we  must  have  ( ^)=  ■ 1,  since  (°)  = (_11)+  and  so  on. 
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Table  164  Pascal’s  triangle,  extended  upward. 


4 

1 

■ 4 

10 

■ 20 

35 

-56 

84 

- 120 

165 

- 220 

286 

3 

1 

■ 3 

6 

- 10 

15 

■ 21 

28 

- 36 

45 

- 55 

66 

2 

1 

■ 2 

3 

-4 

5 

- 6 

7 

- 8 

9 

- 10 

11 

1 

1 

- 1 

1 

■ 1 

1 

■ 1 

1 

■ 1 

1 

■ 1 

1 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

All  these  numbers  are  familiar.  Indeed,  the  rows  and  columns  of  Ta- 
ble 164  appear  as  columns  in  Table  155  (but  minus  the  minus  signs).  So 
there  must  be  a connection  between  the  values  of  (£)  for  negative  n and  the 
values  for  positive  n.  The  general  rule  is 

(k)=  ("^O  k O’  integerk;  (5-x4) 

it  is  easily  proved,  since 

r4  = r(r  — 1 ) . . . (r  — k + 1 ) 

= (-l)k(-r)(l  — r) . . . (k  — 1 -r)  = H)k(k-r-1)^ 


when  k ;>  0,  and  both  sides  are  zero  when  k < 0. 

Identity  (5.14)  is  particularly  valuable  because  it  holds  without  any  re- 
striction. (Of  course,  the  lower  index  must  be  an  integer  so  that  the  binomial 
coefficients  are  defined.)  The  transformation  in  (5.14)  is  called  negating  the 
upper  index,  or  “upper  negation!’ 

But  how  can  we  remember  this  important  formula?  The  other  identities 
we’ve  seen-symmetry,  absorption,  addition,  etc.  -are  pretty  simple,  but 
this  one  looks  rather  messy.  Still,  there’s  a mnemonic  that’s  not  too  bad:  To 
negate  the  upper  index,  we  begin  by  writing  down  (— l)k,  where  k is  the  lower 
index.  (The  lower  index  doesn’t  change.)  Then  we  immediately  write  k again, 
twice,  in  both  lower  and  upper  index  positions.  Then  we  negate  the  original 
upper  index  by  subtracting  it  from  the  new  upper  index.  And  we  complete 
the  job  by  subtracting  1 more  (always  subtracting,  not  adding,  because  this 
is  a negation  process). 

Let’s  negate  the  upper  index  twice  in  succession,  for  practice.  We  get 


H)«(k-<k-rk-‘)-1 


You  call  this  a 
mnemonic?  I'd  call 
it  pneumatic— 
full  of  air. 

It  does  help  me 
remember,  though. 


(Now  is  a good 
time  to  do  warmup 
exercise  4.) 
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It’s  also  frustrating, 
if  we’re  trying  to 
get  somewhere  else. 


(Here  double  nega- 
tion helps,  because 
we’ve  sandwiched 
another  operation  in 
between.) 


so  we’re  right  back  where  we  started.  This  is  probably  not  what  the  framers  of 
the  identity  intended;  but  it’s  reassuring  to  know  that  we  haven’t  gone  astray. 

Some  applications  of  (5.14)  are,  of  course,  more  useful  than  this.  We  can 
use  upper  negation,  for  example,  to  move  quantities  between  upper  and  lower 
index  positions.  The  identity  has  a symmetric  formulation, 

M)m^  ^ ^ ^ ’ integers  m,n  I>  0,  (5.15) 


which  holds  because  both  sides  are  equal  to  (m^n) . 

Upper  negation  can  also  be  used  to  derive  the  following  interesting  sum: 


(5-16) 


The  idea  is  to  negate  the  upper  index,  then  apply  (5.9),  and  negate  again: 


r — 1 
k 


/-r  + m 

V m 


This  formula  gives  us  a partial  sum  of  the  pth  row  of  Pascal’s  triangle,  provided 
that  the  entries  of  the  row  have  been  given  alternating  signs.  For  instance,  if 
r = 5 and  m — 2 the  formula  gives  1 —5  + 10  = 6 = (—1  )2  Q)  • 

Notice  that  if  m ^ r,  (5.16)  gives  the  alternating  sum  of  the  entire  row, 
and  this  sum  is  zero  when  r is  a positive  integer.  We  proved  this  before,  when 
we  expanded  (1  — 1)’  by  the  binomial  theorem;  it’s  interesting  to  know  that 
the  partial  sums  of  this  expression  can  also  be  evaluated  in  closed  form. 

How  about  the  simpler  partial  sum, 


(5-17) 


surely  if  we  can  evaluate  the  corresponding  sum  with  alternating  signs,  we 
ought  to  be  able  to  do  this  one?  But  no;  there  is  no  closed  form  for  the  partial 
sum  of  a row  of  Pascal’s  triangle.  We  can  do  columns-that’s  (5.10)  — but 


166  BINOMIAL  COEFFICIENTS 


not  rows.  Curiously,  however,  there  is  a way  to  partially  sum  the  row  elements 
if  they  have  been  multiplied  by  their  distance  from  the  center: 

L (k)  G ' k > = “T2(m+  l}«  in“8er  (5-‘8> 

k$m  v ' v ' 


(This  formula  is  easily  verified  by  induction  on  ru.)  The  relation  between 
these  partial  sums  with  and  without  the  factor  of  (r/2  — k)  in  the  summand 
is  analogous  to  the  relation  between  the  integrals 


and 


■a 

> -oo 


e *2dx. 


The  apparently  more  complicated  integral  on  the  left,  with  the  factor  of  x, 
has  a closed  form,  while  the  simpler-looking  integral  on  the  right,  without  the 
factor,  has  none.  Appearances  can  be  deceiving. 

At  the  end  of  this  chapter,  we’ll  study  a method  by  which  it’s  possible 
to  determine  whether  or  not  there  is  a closed  form  for  the  partial  sums  of  a 
given  series  involving  binomial  coefficients,  in  a fairly  general  setting.  This 
method  is  capable  of  discovering  identities  (5.16)  and  (5.18),  and  it  also  will 
tell  us  that  (5.17)  is  a dead  end. 

Partial  sums  of  the  binomial  series  lead  to  a curious  relationship  of  an- 
other kind: 


(Well,  it  actually 
equals  j\Znerf  oc, 
a multiple  of  the 
“error  function’’ 
of  oc,  if  we’re  will- 
ing to  accept  that 
as  a closed  form.) 


Y fTTuhTVkym  k=  y ( ,rN)(-x)k(x  + y)m  k,  integer  m.  (5.19) 
k^m  k ' k^m  ' 


This  identity  isn’t  hard  to  prove  by  induction:  Both  sides  are  zero  when 
m < 0 and  1 when  m = 0.  If  we  let  Sm  stand  for  the  sum  on  the  left,  we  can 
apply  the  addition  formula  (5.8)  and  show  easily  that 


sm  = y_ 

k^m 


m — 1 + r 

k 


k$m 


and 


L 

k$m 

L 


m - 1 + 

k 

m — 1 + r 
k-  1 


xkym-k  = ySm_,  + 


k-  1 


m — I + r 
m 


k,,m-k  _ Yc 

- , 


X~tJ 


k$m 

when  m > 0.  Hence 

S tty  — ( X y)Sm  l + 
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and  this  recurrence  is  satisfied  also  by  the  right-hand  side  of  (5.19).  By 
induction,  both  sides  must  be  equal;  QED. 

But  there’s  a neater  proof.  When  r is  an  integer  in  the  range  0 ^ r — m, 
the  binomial  theorem  tells  us  that  both  sides  of  (5.19)  are  ( x + y ) m tr  y r . And 
since  both  sides  are  polynomials  in  r of  degree  m or  less,  agreement  at  m 1 
different  values  is  enough  (but  just  barely!)  to  prove  equality  in  general. 

It  may  seem  foolish  to  have  an  identity  where  one  sum  equals  another. 
Neither  side  is  in  closed  form.  But  sometimes  one  side  turns  out  to  be  easier 
to  evaluate  than  the  other.  For  example,  if  we  set  x = -1  and  y = 1 , we  get 


integer  m ^ 0, 


an  alternative  form  of  identity  (5.16).  And  if  we  set  x = y = 1 and  r = m + 1 , 
we  get 


The  left-hand  side  sums  just  half  of  the  binomial  coefficients  with  upper  index 
2m  + 1 , and  these  are  equal  to  their  counterparts  in  the  other  half  because 
Pascal’s  triangle  has  left-right  symmetry.  Hence  the  left-hand  side  is  just 
|22m+1  = 22m.  This  yields  a formula  that  is  quite  unexpected, 


E 


m + k\  k 
k )2 


integer  m ^ 0. 


(5-20) 


Let’s  check  it  when  m = 2:  (2)  + = 1 + | + | = 4.  Astounding. 

So  far  we’ve  been  looking  either  at  binomial  coefficients  by  themselves  or 
at  sums  of  terms  in  which  there’s  only  one  binomial  coefficient  per  term.  But 
many  of  the  challenging  problems  we  face  involve  products  of  two  or  more 
binomial  coefficients,  so  we’ll  spend  the  rest  of  this  section  considering  how 
to  deal  with  such  cases. 

Here’s  a handy  rule  that  often  helps  to  simplify  the  product  of  two  bino- 
mial coefficients: 

integers  m,  k.  (5.21) 

We’ve  already  seen  the  special  case  k = 1;  it’s  the  absorption  identity  (5.6). 
Although  both  sides  of  (5.21)  are  products  of  binomial  coefficients,  one  side 
often  is  easier  to  sum  because  of  interactions  with  the  rest  of  a formula.  For 
example,  the  left  side  uses  m twice,  the  right  side  uses  it  only  once.  Therefore 
we  usually  want  to  replace  by  (£)  when  summing  on  m, 


r-k 
m — k 
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Equation  (5.21)  holds  primarily  because  of  cancellation  between  ml's  in 
the  factorial  representations  of  (^)  and  (™)  . If  all  variables  are  integers  and 
O m k ^ 0,  we  have 

( r \ _ r!  m! 

\m/  \kj  “ m!(r-mTT  k!(m-k)! 


k!  (m-  k) ! (r-m)! 

___  r!  (r  — k)!  _ M /r  - k\ 

k ! (r-k) ! (m-k)!(r-m)!  \k/\m  — k)  ‘ 

That  was  easy.  Furthermore,  if  m < k or  k <0,  both  sides  of  (5.21)  are 
zero;  so  the  identity  holds  for  all  integers  m and  k.  Finally,  the  polynomial 
argument  extends  its  validity  to  all  real  r. 

A binomial  coefficient  (£)  = t!/(t  k) ! k!  can  be  written  in  the  form 
(a  -|-  b)!/a!  b!  after  a suitable  renaming  of  variables.  Similarly,  the  quantity 
in  the  middle  of  the  derivation  above,  r!/k!  (m  — k) ! (r—  m)!,  can  be  written 
in  the  form  (a  + b + c)!/a!  b!  c!.  This  is  a “trinomial  coefficient  ” which  arises 
in  the  “trinomial  theorem”  : 


(x  + y + z)n  = 


(Q+b+c)!xVv 


0^a,b,c^n 

a+b+c=n 


L 


0^a,b,c$n 

Q+b+c=n 


a!  b!  c! 

a + b + c 

b + C 


b + c i a be 
x y z. 


So  (JJ  is  really  a trinomial  coefficient  in  disguise.  Trinomial  coefficients 
pop  up  occasionally  in  applications,  and  we  can  conveniently  write  them  as 

/a  + b + c\  _ (a  + b + c)! 

\ a,  b,  c J a!b!c! 

in  order  to  emphasize  the  symmetry  present. 

Binomial  and  trinomial  coefficients  generalize  to  multinomial  coeffi- 
cients, which  are  always  expressible  as  products  of  binomial  coefficients: 


/qi  + a2  + . . . + a, 

V Q1  > q2>  • ■ • > Qm  / 


(ai  + a.2  + • • • + am)! 

Oi  ! Q2!  • • • a,! 

/Qi  + a2  + . . . + a,  \ f am-i  + am\ 

V a2  + • • • + a,  J " ’ V am  / . 


Yeah,  right. 


“Excogitavi  autem 
oliw  mirabilem 
regulam  pro  nu- 
meris  coefficient/bus 
potestatum,  non 
tantum  a binomio 
x + y , sed  et  a 
trinomio  x + y + z, 
irao  a polynomio 
quocunque,  ut  data 
potentia  grad  US 
cujuscunque  v. 
gr.  decimi , et 
potentia  in  ejus 
valore  comprehensa, 
ut  x5y3z2,  possim 
statim  assignare 
numerum  coef- 
Scientem,  quern 
habere  debet.  sine 
ulla  Tabula  jam 
calculata." 

— G.W.  Leibniz  [200] 


Therefore,  when  we  run  across  such  a beastie,  our  standard  techniques  apply. 
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Table  169  Sums  of  products  of  binomial  coefficients. 


E 

k 

E 

k 

E 

k 

E 


r 

m + kj  V n — k 


r + s 
m + n 


1 )(  s 

m + k ) \n  + k 


1 S 

l-m  + nj  1 


l 

m + k 


s + k' 


u 


M)k  = M 


\l+m 


s — m 
u — l 


l — k' 


m 


k-n 


(-1)k=  (-1 


il+m 


s — m — 1 
l — m — u 


integers  m,  n.  (5.22) 
integer  l > 0, 

15  231 

integers  m,  n.  v ‘ ’ 

integer  l S 0. 
integers  m,  n.  w ' 

integers 

(5-25) 


integers  l,  m ^ 0, 
integers  n ^ q ^ 0.  1 


Fold  down  Ihe 
corner  on  this  page, 
so  you  can  find  the 
table  quickly  later, 
you  ’ll  need  it! 


Now  we  come  to  Table  169,  which  lists  identities  lhat  am  among  the  most 
important  of  our  standard  techniques.  These  are  the  ones  we  rely  on  when 
struggling  with  a sum  involving  a product  of  two  binomial  coefficients.  Each 
of  these  identities  is  a sum  over  k,  with  one  appearance  of  k in  each  binomial 
coefficient;  there  also  are  four  nearly  independent  parameters,  called  m,  n,  r, 
etc.,  one  in  each  index  position.  Different  cases  arise  depending  on  whether  k 
appears  in  the  upper  or  lower  index,  and  on  whether  it  appears  with  a plus  or 
minus  sign.  Sometimes  there’s  an  additional  factor  of  (-1  )k,  which  is  needed 
to  make  the  terms  summable  in  closed  form. 

Table  169  is  far  too  complicated  to  memorize  in  full;  it  is  intended  only 
for  reference.  But  the  first  identity  in  this  table  is  by  far  the  most  memorable, 
and  it  should  be  remembered.  It  states  that  the  sum  (over  all  integers  k)  of  the 
product  of  two  binomial  coefficients,  in  which  the  upper  indices  are  constant 
and  the  lower  indices  have  a constant  sum  for  all  k,  is  the  binomial  coefficient 
obtained  by  summing  both  lower  and  upper  indices.  This  identity  is  known 
as  Vandermonde’s  convolution,  because  Alexandre  Vandermonde  wrote  a 
significant  paper  about  it  in  the  late  1700s  [293];  it  was,  however,  known 
to  Chu  Shih-Chieh  in  China  as  early  as  1303.  All  of  the  other  identities  in 
Table  169  can  be  obtained  from  Vandermonde’s  convolution  by  doing  things 
like  negating  upper  indices  or  applying  the  symmetry  law,  etc.,  with  care; 
therefore  Vandermonde’s  convolution  is  the  most  basic  of  all. 

We  can  prove  Vandermonde’s  convolution  by  giving  it  a nice  combinato- 
rial interpretation.  If  we  replace  k by  k — m and  n by  n — m,  we  can  assume 
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that  m = 0;  hence  the  identity  to  be  proved  is 


integer  n. 


(5-27) 


Let  r and  s be  nonnegative  integers;  the  general  case  then  follows  by  the 
polynomial  argument.  On  the  right  side,  (T^s)  is  the  number  of  ways  to 
choose  n people  from  among  r men  and  s women.  On  the  left,  each  term 
of  the  sum  is  the  number  of  ways  to  choose  k of  the  men  and  n — k of  the 
women.  Summing  over  all  k.  counts  each  possibility  exactly  once. 

Much  more  often  than  n.ot  we  use  these  identities  left  to  right,  since  that’s 
the  direction  of  simplification.  But  every  once  in  a while  it  pays  to  go  the 
other  direction,  temporarily  making  an  expression  more  complicated.  When 
this  works,  we’ve  usually  created  a double  sum  for  which  we  can  interchange 
the  order  of  summation  and  then  simplify. 

Before  moving  on  let’s  look  at  proofs  for  two  more  of  the  identities  in 
Table  169.  It’s  easy  to  prove  (5.23);  all  we  need  to  do  is  replace  the  first 
binomial  coefficient  by  _k),  then  Vandermonde’s  (5.22)  applies. 

The  next  one,  (5.24),  is  a bit  more  difficult.  We  can  reduce  it  to  Van- 
dermonde’s convolution  by  a sequence  of  transformations,  but  we  can  just 
as  easily  prove  it  by  resorting  to  the  old  reliable  technique  of  mathematical 
induction.  Induction  is  often  the  first  thing  to  try  when  nothing  else  obvious 
jumps  out  at  us,  and  induction  on  \ works  just  fine  here. 

For  the  basis  1 = 0,  all  terms  are  zero  except  when  k = — m;  so  both  sides 
of  the  equation  are  (— l)m(sum).  Now  suppose  that  the  identity  holds  for  all 
values  less  than  some  fixed  L,  where  l > 0.  We  can  use  the  addition  formula 
to  replace  (ml)_k)  by  + (m^"k1_1)  ; the  original  sum  now  breaks  into  two 
sums,  each  of  which  can  be  evaluated  by  the  induction  hypothesis: 


1-1 

m + k — 


s + k’ 
n 


(-11 


= (-1 


il-l+m 


s - m 
n — l + 1 


+ (_l)l+m 


/s  - m+  1\ 


And  this  simplifies  to  the  right-hand  side  of  (5.24),  if  we  apply  the  addition 
formula  once  again. 

Two  things  about  this  derivation  are  worthy  of  note.  First,  we  see  again 
the  great  convenience  of  summing  over  all  integers  k,  not  just  over  a certain 
range,  because  there’s  no  need  to  fuss  over  boundary  conditions.  Second, 
the  addition  formula  works  nicely  with  mathematical  induction,  because  it’s 
a recurrence  for  binomial  coefficients.  A binomial  coefficient  whose  upper 
index  is  l is  expressed  in  terms  of  two  whose  upper  indices  are  1 1 , and 

that’s  exactly  what  we  need  to  apply  the  induction  hypothesis. 


Sexist!  You  men- 
tioned men  first, 
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So  much  for  Table  169.  What  about  sums  with  three  or  more  binomial 
coefficients?  If  the  index  of  summation  is  spread  over  all  the  coefficients,  our 
chances  of  finding  a closed  form  aren’t  great:  Only  a few  closed  forms  are 
known  for  sums  of  this  kind,  hence  the  sum  we  need  might  not  match  the 
given  specs.  One  of  these  rarities,  proved  in  exercise  43,  is 


Here’s  another,  more  symmetric  example: 


(5-28) 


(a  + b + c) ! 
a I b!  d 


integers  a,  b,  c ^ 0. 


This  one  has  a two-coefficient  counterpart, 


(5-29) 


(a  + b)! 

a!  b! 


integers  a,  b ^ 0,  ( 5 . 3 0 ) 


which  incidentally  doesn’t  appear  in  Table  169.  The  analogous  four-coefficient 
sum  doesn’t  have  a closed  form,  but  a similar  sum  does: 


y )k  /a  + b\  /b  + c\  /c  + d\  /d  + ct\  / /2a  + 2b  + 2c  + 2d\ 
y VQ  + V \b  + k)  \c  + k)  \d  + k)  / \ a + b + c + d + k) 

(a+b+c+d)!  (a+b+c)!  (a+b+d)!  (a+c+d)!  (b+c+d)! 
(2a+2b+2c  + 2d)!  (a+c) ! (b+d) ! a!  b!  c!  d! 

integers  a,  b,  c,  d JJ:  0. 

This  was  discovered  by  John  Dougall  [69]  early  in  the  twentieth  century. 

Is  Dougall’ s identity  the  hairiest  sum  of  binomial  coefficients  known?  No! 
The  champion  so  far  is 


ku 


Qj  + an 

Q-u  T lujkij 


/ai  -f  ■ • • + an  \ 
Vat,a2,...,any 


integers  ai , a2, . . . , a,  ^ 0. 


(5-3i) 


Here  the  sum  is  over  (n-,')  index  variables  kij  for  1 $ i <j  < n.  Equation 
(5.29)  is  the  special  case  n = 3;  the  case  n = 4 can  be  written  out  as  follows, 
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ifweuse  (a,b,c,d)for  (ai , a2)  Q3,  Q4)  and  (i,  j , k.)  for  (k.12,  ki3,  k.23 ) : 


Jj-l)l+i+k 

i,),k 


(a  + b + c + d)! 
alblcld!  “ 


/b+c\  / a+d  \ / b+d 
\c+k)  Vd-i-jJ  Vd+l-K 

integers  a,  b,  c,  d ^ 0. 


c+d  \ 
d+j+k/ 


The  left  side  of  (5.31)  is  the  coefficient  of  . • . z°  after  the  product  of 
n(n  — 1)  fractions 


n 


has  been  fully  expanded  into  positive  and  negative  powers  of  the  z’s.  The 
right  side  of  (5.31)  was  conjectured  by  Freeman  Dyson  in  1962  and  proved  by 
several  people  shortly  thereafter.  Exercise  86  gives  a “simple”  proof  of  (5.31). 
Another  noteworthy  identity  involving  lots  of  binomial  coefficients  is 


I(-1  )i+k 

j.k 


/ j + k\  /r\  /n\  /m  + n — j — k\ 

V i /Vi/ wv  ) 


integers  m,  n ^ 0. 


(5-32) 


This  one,  proved  in  exercise  83,  even  has  a chance  of  arising  in  practical 
applications.  But  we’re  getting  far  afield  from  our  theme  of  “basic  identities,’ 
so  we  had  better  stop  and  take  stock  of  what  we’ve  learned. 

We’ve  seen  that  binomial  coefficients  satisfy  an  almost  bewildering  va- 
riety of  identities.  Some  of  these,  fortunately,  are  easily  remembered,  and 
we  can  use  the  memorable  ones  to  derive  most  of  the  others  in  a few  steps. 
Table  174  collects  ten  of  the  most  useful  formulas,  all  in  one  place;  these  are 
the  best  identities  to  know. 


5.2  BASIC  PRACTICE 

In  the  previous  section  we  derived  a bunch  of  identities  by  manipu- 
lating sums  and  plugging  in  other  identities.  It  wasn’t  too  tough  to  find  those 
derivations-  we  knew  what  we  were  trying  to  prove,  so  we  could  formulate 
a general  plan  and  fill  in  the  details  without  much  trouble.  Usually,  however, 
out  in  the  real  world,  we’re  not  faced  with  an  identity  to  prove;  we’re  faced 
with  a sum  to  simplify.  An.d  we  don’t  know  what  a simplified  form  might 
look  like  (or  even  if  one  exists).  By  tackling  many  such  sums  in  this  section 
and  the  next,  we  will  hone  our  binomial  coefficient  tools. 
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Algorithm 

self-teach: 

1 read  problem 

2 attempt  solution 

3 skim  book  solu- 

tion 

4 if  attempt  failed 

goto  1 

else  goto  next 
problem 


Unfortunately, 
that  algorithm 
can  put  you  in  an 
infinite  loop, 
Suggested  patches: 

0 set  c «—  0 
3a  set  c <-  c + 1 
3b  if  c = N 
go&  your  TA 


- E . W.  Dijkstra 


, But  this  sub- 
chapter is  called 
BASIC  practice. 


To  start,  let’s  try  our  hand  at  a few  sums  involving  a single  binomial 
coefficient. 

Problem,  1:  A sum  of  ratios. 

We’d  like  to  have  a closed  form  for 


integers  n 2>  m ^ 0. 


At  first  glance  this  sum  evokes  panic,  because  we  haven’t  seen  any  identi- 
ties that  deal  with  a quotient  of  binomial  coefficients.  (Furthermore  the  sum 
involves  two  binomial  coefficients,  which  seems  to  contradict  the  sentence 
preceding  this  problem.)  However,  just  as  we  can  use  the  factorial  represen- 
tations to  reexpress  a product  of  binomial  coefficients  as  another  product  — 
that’s  how  we  got  identity  (5.21)  — we  can  do  likewise  with  a quotient.  In 
fact  we  can  avoid  the  grubby  factorial  representations  by  letting  r = n and 
dividing  both  sides  of  equation  (5.21)  by  (£)  (^J ; this  yields 


(;)/(;)  - (::;)/(;)• 

So  we  replace  the  quotient  on  the  left,  which  appears  in  our  sum,  by  the  one 
on  the  right;  the  sum  becomes 


We  still  have  a quotient,  but  the  binomial  coefficient  in  the  denominator 
doesn’t  involve  the  index  of  summation  k,  so  we  can  remove  it  from  the  sum. 
We’ll  restore  it  later. 

We  can  also  simplify  the  boundary  conditions  by  summing  over  all  k 0; 
the  terms  for  k > m are  zero.  The  sum  that’s  left  isn’t  so  intimidating: 


It’s  similar  to  the  one  in  identity  (5.9),  because  the  index  k appears  twice 
with  the  same  sign.  But  here  it’s  -k  and  in  (5.9)  it’s  not.  The  next  step 
should  therefore  be  obvious;  there’s  only  one  reasonable  thing  to  do: 

y /n  — (m  — k)\ 

Vm-(m-k )J 
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Table  174  The  ton  ten  binomial  coefficient  identities. 


k!(n— k)! 

n 

n — K 

r / r — 1 ' 

kU-ir 


integers 
n ^ k > 0. 


integer  n 0, 
integer  k. 


Y)  + (L'il'in,eger  k- 


(— 1 )k(  ],  integer  k. 

1C 


E 


E 

k$n 


xkyr~k 


r + k 


r\  /r-k 
kj  Vm-k 


= (x  + y)r, 


integers  m,  k. 

integer  r 0, 
or  |x/y|  < 1. 


E 

0$k$n 


l(: 


k)  \n  — k 


r + n + 1 \ 

integer  n. 

n )• 

u + n 

integers 

m + V ’ 

m,  n ^ 0. 

r + s\ 

integer  n. 

factorial  expansion 


symmetry 

integer  k ^ 0.  absorption/extraction 


addition/induction 
upper  negation 
trinomial  revision 
binomial  theorem 
parallel  summation 
upper  summation 


And 


now  we  can  apply  the  parallel  summation  identity, 
n — m + k\  /‘(n-m)  +m+  1 


E 

k<Jm 


m 


) 


(5-9): 
u + l\ 


Finally’  we  reinstate  the  in  the  denominator  that  we  removed  from 
the  sum  earlier,  and  then  apply  (5.7)  to  get  the  desired  closed  form: 

/n  + 1\  //n\  n + 1 

\ m ) / = n + 1 — m ' 

This  derivation  actually  works  for  any  real  value  of  n,  as  long  as  no  division 
by  zero  occurs;  that  is,  as  long  as  n isn’t  one  of  the  integers  0,  1,  . . . , nr  1. 
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Please,  don't  re- 
mind me  of  the 

midterm. 


The  more  complicated  the  derivation,  the  more  important  it  is  to  check 
the  answer.  This  one  wasn’t  too  complicated  but  we’ll  check  anyway.  In  the 
small  case  m = 2 and  n = 4 we  have 


. 1 1 
" 1 +2  + 6 


5 

3 1 


yes,  this  agrees  perfectly  with  our  closed  form  (4  + 1 )/(4  + 1 2). 


Problem  2:  From  the  literature  of  sorting. 

Our  next  sum  appeared  way  back  in  ancient  times  (the  early  1970s) 
before  people  were  fluent  with  binomial  coefficients.  A paper  that  introduced 
an  improved  merging  technique  [165]  concludes  with  the  following  remarks: 
“It  can  be  shown  that  the  expected  number  of  saved  transfers  . . is  given  by 
the  expression 


t-L 


r=0 


_ m r-l'-m-n  1 


Here  m and  n are  as  defined  above,  and  mCn  is  the  symbol  for  the  number 
of  combinations  of  m objects  taken  n at  a time.  . . . The  author  is  grateful  to 
the  referee  for  reducing  a more  complex  equation  for  expected  transfers  saved 
to  the  form  given  here.” 

We’ll  see  that  this  is  definitely  not  a final  answer  to  the  author’s  problem. 
It’s  not  even  a midterm  answer. 

First  we  should  translate  the  sum  into  something  we  can  work  with;  the 
ghastly  notation  m-r-lCm_n-i  is  enough  to  stop  anybody,  save  the  enthusi- 
astic referee  (please).  In  our  language  we’d  write 


T = 


m - k — 1 
m — rt  — 1 


/ 


integers  m > n 3 0. 


The  binomial  coefficient  in  the  denominator  doesn’t  involve  the  index  of  sum- 
mation, so  we  can  remove  it  and  work  with  the  new  sum 


m-  k-  1 
m - n - 1 


What  next?  The  index  of  summation  appears  in  the  upper  index  of  the 
binomial  coefficient  but  not  in  the  lower  index.  So  if  the  other  k weren’t  there, 
we  could  massage  the  sum  and  apply  summation  on  the  upper  index  (5.10). 
With  the  extra  k,  though,  we  can’t.  If  we  could  somehow  absorb  that  k into 
the  binomial  coefficient,  using  one  of  our  absorption  identities,  we  could  then 
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sum  on  the  upper  index.  Unfortunately  those  identities  don’t  work  here.  But 
if  the  k were  instead  m — k,  we  could  use  absorption  identity  (5.6): 

(m-k)fm  k ^ = (m-n)fm-kV 
\m  — n — 1 / \m-nj 

So  here’s  the  key:  We’ll  rewrite  k as  m — (m  — k)  and  split  the  sum  S 
into  two  sums: 


= mA-  (m-n)B, 


where 


The  sums  A and  B that  remain  are  none  other  than  our  old  friends  in 
which  the  upper  index  varies  while  the  lower  index  stays  fixed.  Let’s  do  B 
first,  because  it  looks  simpler.  A little  bit  of  massaging  is  enough  to  make  the 
summand  match  the  left  side  of  (5.10): 


E 

O^k^n 


m — k 


m — n 


O^m— k^n 


E 

;TTl  — k 

- E 

m— n$k$m 

- E 

O^k^m 


m — (m  — k) 
m — n 


k 

m — n 
k 

m — n 


In  the  last  step  we’ve  included  the  terms  with  0 k < m — n in  the  sum; 
they’re  all  zero,  because  the  upper  index  is  less  than  the  lower.  Now  we  sum 
on  the  upper  index,  using  (5.10),  and  get 


B = 
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The  other  sum  A is  the  same,  but  with  m replaced  by  m 1 . Hence  we 
have  a closed  form  for  the  given  sum  S,  which  can  be  further  simplified: 


$ = mA-(m-n)B  = m 


m 

m — n 


- m-n 


m+  1 

m — n + 1 


m-  m — n 


n 


m 


m + 1 


m-n  + 1 1 \m-n 


m 


+n  — n + 1 ) \m  — n 
And  this  gives  us  a closed  form  for  the  original  sum: 


= ILJ'.  m ^ /(m) 

m-n+  l\m  - n)  / \nj 
n 

ru-n  + 1 

Even  the  referee  can’t  simplify  this. 

Again  we  use  a small  case  to  check  the  answer.  When  m = 4 and  n = 2, 
we  have 

T = »•  (?)/(!)  + i-(i)/(l)  + 2- (!)/(!)  = 0+  \ +1  = §, 

which  agrees  with  our  formula  2/(4  — 2 + 1). 

Problem  3:  From  an  old  exam. 

Let’s  do  one  more  sum  that  involves  a single  binomial  coefficient.  This 
Do  old  exams  one,  unlike  the  last,  originated  in  the  halls  of  academia;  it  was  a problem  on 

ever  die?  a take-home  test.  We  want  the  value  of  Qioooooo,  when 

Qn  = Y.  C v integer  n^O. 

This  one’s  harder  than  the  others;  we  can’t  apply  any  of  the  identities  we’ve 
seen  so  far.  And  we’re  faced  with  a sum  of  2100000C  terms,  so  we  can’t  just 
add  them  up.  The  index  of  summation  k appears  in  both  indices,  upper  and 
lower,  but  with  opposite  signs.  Negating  the  upper  index  doesn’t  help,  either; 
it  removes  the  factor  of  (-1  )k,  but  it  introduces  a 2k  in  the  upper  index. 

When  nothing  obvious  works,  we  know  that  it’s  best  to  look  at  small 
cases.  If  we  can’t  spot  a pattern  and  prove  it  by  induction,  at  least  we’ll  have 
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some  data  for  checking  our  results.  Here  are  the  nonzero  terms  and  their  sums 
for  the  first  four  values  of  n. 


n 


Q» 


0 (’)  =1  =1 

ife-(i)  =1-1  =• 

1 (S)  - (?)  + (I)  =1-1+1 

3 (?)  - (1)  + (!)  - (!)  + (?)  = 1 -7+  15-  10+  t = 0 


We’d  better  not  try  the  next  case,  n = 4;  the  chances  of  making  an  arithmetic 
error  are  too  high.  (Computing  terms  like  ('42)  and  ( ) by  hand,  let  alone 
combining  them  with  the  others,  is  worthwhile  only  if  we’re  desperate.) 

So  the  pattern  starts  out  1 , 0,  - 1 , 0.  Even  if  we  knew  the  next  term  or 

two,  the  closed  form  wouldn’t  be  obvious.  But  if  we  could  find  and  prove  a 
recurrence  for  Qn  we’d  probably  be  able  to  guess  and  prove  its  closed  form. 
To  find  a recurrence,  we  need  to  relate  Qn  to  Qn  i (or  to  Qsmaiier  values);  but 
to  do  this  we  need  to  relate  a term  like  (12*313),  which  arises  when  n — 7 and 
k = 13,  to  terms  like  f64^1 3) . This  doesn’t  look  promising;  we  don’t  know 
any  neat  relations  between  entries  in  Pascal’s  triangle  that  are  64  rows  apart. 
The  addition  formula,  our  main  tool  for  induction  proofs,  only  relates  entries 
that  are  one  row  apart. 

But  this  leads  us  to  a key  observation:  There’s  no  need  to  deal  with 
entries  that  are  2n  rows  apart.  The  variable  n never  appears  by  itself,  it’s 
always  in  the  context  2n.  So  the  2n  is  a red  herring!  If  we  replace  2”  by  m, 
all  we  need  to  do  is  find  a closed  form  for  the  more  general  (but  easier)  sum 


integer  m ^ 0; 


Oh.  the  sneakiness 
of  the  instructor 
who  set  that  exam, 


then  we’ll  also  have  a closed  form  for  Qn  = Rj=i . And  there’s  a good  chance 
that  the  addition  formula  will  give  us  a recurrence  for  the  sequence  R , . 

Values  of  Rm  for  small  m can  be  read  from  Table  155,  if  we  alternately 
add  and  subtract  values  that  appear  in  a southwest-to-northeast  diagonal. 
The  results  are: 


m 

0 

1 

2 

3 

4 

5 6 

7 8 9 

10 

Rm 

1 

1 

0 

-1 

-1 

0 1 

1 0 -1 

-1 

There  seems  to  be  a lot  of  cancellation  going  on. 

Let’s  look  now  at  the  formula  for  Rm  and  see  if  it  defines  a recurrence. 
Our  strategy  is  to  apply  the  addition  formula  (5.8)  and  to  find  sums  that 
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Anyway  those  of 
us  who’ve  done 
warmup  exercise  4 
know  it. 


have  the  form  Rk  in  the  resulting  expression,  somewhat  as  we  did  in  the 
perturbation  method  of  Chapter  2: 


(In  the  next-to-last  step  we’ve  used  the  formula  ( = (— l)m,  which  we  know 
is  true  when  m ^ 0.)  This  derivation  is  valid  for  m ^ 2. 

Prom  this  recurrence  we  can  generate  values  of  Rm  quickly,  and  we  soon 
perceive  that  the  sequence  is  periodic.  Indeed, 


( 0 


0 

■ 1 

-1 

0 


if  m mod  6 = < 


2 

3 

4 


k 5 


The  proof  by  induction  is  by  inspection.  Or,  if  we  must  give  a more  academic 
proof,  we  can  unfold  the  recurrence  one  step  to  obtain 


Rm  “ (Rm— 2 Rm— 3)  Rm-2  “ Rm-3  > 

whenever  m i>  3.  Hence  Rm  = Rm  g whenever  m ^ 6. 

Finally,  since  Qn  = R2",  we  can  determine  Qn  by  determining  2”  mod  6 
and  using  the  closed  form  for  Rm.  When  n = 0 we  have  2°  mod  6=1;  after 
that  we  keep  multiplying  by  2 (mod  6),  so  the  pattern  2,  4 repeats.  Thus 

I"  Rl  = 1 , if  n = 0; 

Qn  = R2"  = < R2  = 0,  if  n is  odd; 

l R4  — -1 , if  n > 0 is  even. 

This  closed  form  for  Qn  agrees  with  the  first  four  values  we  calculated  when 

we  started  on  the  problem.  We  conclude  that  Q 1000000  = R4  = -1. 
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Problem  4:  A sum  involving  two  binomial  coefficients. 

Our  next  task  is  to  find  a closed  form  for 


m — k — 1 
m — n — 1 


integers  m > n ^ 0. 


Wait  a minute.  Where’s  the  second  binomial  coefficient  promised  in  the  title 
of  this  problem?  And  why  should  we  try  to  simplify  a sum  we’ve  already 
simplified?  (This  is  the  sum  S from  Problem  2.) 

Well,  this  is  a sum  that’s  easier  to  simplify  if  we  view  the  summand 
as  a product  of  two  binomial  coefficients,  and  then  use  one  of  the  general 
identities  found  in  Table  169.  The  second  binomial  coefficient  materializes 
when  we  rewrite  k as  (^) : 


m — k — 1 
m — n — 1 


L 

0^k$n 


m-k-  1 
m — n — 1 


And  identity  (5.26)  is  the  one  to  apply,  since  its  index  of  summation  appears 
in  both  upper  indices  and  with  opposite  signs. 

But  our  sum  isn’t  quite  in  the  correct  form  yet.  The  upper  limit  of 
summation  should  be  m 1,  if  we’re  to  have  a perfect  match  with  (5.26).  No 
problem;  the  terms  for  n < k m — 1 are  zero.  So  we  can  plug  in,  with 
(l,  m,  n,  q)  <—  (m  — 1,  m-n.  1,1,0);  the  answer  is 

s-(  m ,) 

\m  - n + 1 / 

This  is  cleaner  than  the  formula  we  got  before.  We  can  convert  it  to  the 
previous  formula  by  using  (5.7): 


/ m \ _ rt  m \ 

\m  — n + ly  m — n + 1 m - n )' 


Similarly,  we  can  get  interesting  results  by  plugging  special  values  into 
the  other  general  identities  we’ve  seen.  Suppose,  for  example,  that  we  set 
TTl  = n = 1 and  q = 0 in  (5.26).  Then  the  identity  reads 


Z (l_k)k 

O^k^l 


The  left  side  is  l((l  + 1 )l/2)  - (l2  +22  + • ■ . + l2),  so  this  gives  us  a brand  new 
way  to  solve  the  sum-of-squares  problem  that  we  beat  to  death  in  Chapter  2. 

The  moral  of  this  story  is:  Special  cases  of  very  general  sums  are  some- 
times best  handled  in  the  general  form.  When  learning  general  forms,  it’s 
wise  to  learn  their  simple  specializations. 
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So  we  should 
deep  six  this 
right? 


Problem  5:  A sum  with  three  factors. 

Here’s  another  sum  that  isn’t  too  bad.  We  wish  to  simplify 


integer  rt  ;>  0. 


The  index  of  summation  k appears  in  both  lower  indices  and  with  the  same 
sign;  therefore  identity  (5.23)  in  Table  169  looks  close  to  what  we  need.  With 
a bit  of  manipulation,  we  should  be  able  to  use  it. 

The  biggest  difference  between  (5.23)  and  what  we  have  is  the  extra  k in 
our  sum.  But  we  can  absorb  k into  one  of  the  binomial  coefficients  by  using 
one  of  the  absorption  identities: 


We  don’t  care  that  the  s appears  when  the  k disappears,  because  it’s  constant. 
And  now  we’re  ready  to  apply  the  identity  and  get  the  closed  form. 


If  we  had  chosen  in  the  first  step  to  absorb  k into  not  Q),  we  wouldn’t 
have  been  allowed  to  apply  (5.23)  directly,  because  n — 1 might  be  negative; 
the  identity  requires  a nonnegative  value  in  at  least  one  of  the  upper  indices. 

Problem  6:  A sum  with  menacing  characteristics. 

The  next  sum  is  more  challenging.  We  seek  a closed  form  for 


n + k\  /2k\  (-1 


k+1 


integer  n ^ 0. 


One  useful  measure  of  a sum’s  difficulty  is  the  number  of  times  the  index  of 
summation  appears.  By  this  measure  we’re  in  deep  trouble-k  appears  six 
times.  Furthermore,  the  key  step  that  worked  in  the  previous  problem-to 
absorb  something  outside  the  binomial  coefficients  into  one  of  them-won’t 
work  here.  If  we  absorb  the  k +1  we  just  get  another  occurrence  of  k in  its 
place.  And  not  only  that:  Our  index  k is  twice  shackled  with  the  coefficient  2 
inside  a binomial  coefficient.  Multiplicative  constants  are  usually  harder  to 
remove  than  additive  constants. 
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We’re  lucky  this  time,  though.  The  2k’s  are  right  where  we  need  them 
for  identity  (5.21)  to  apply,  so  we  get 

k>0  N \ / k^O  \ / \ / 

The  two  2’s  disappear,  and  so  does  one  occurrence  of  k.  So  that’s  one  down 
and  five  to  go. 

The  k+  1 in  the  denominator  is  the  most  troublesome  characteristic  left, 
and  now  we  can  absorb  it  into  (k)  using  identity  (5.6): 


s(nr)(^ 


(Recall  that  n 0.)  Two  down,  four  to  go. 

To  eliminate  another  k we  have  two  promising  options.  We  could  use 
symmetry  on  (nkk);  or  we  could  negate  the  upper  index  n + k,  thereby  elim- 
inating that  k as  well  as  the  factor  (— l)k.  Let’s  explore  both  possibilities, 
starting  with  the  symmetry  option: 


Third  down,  three  to  go,  and  we’re  in  position  to  make  a big  gain  by  plugging 
into  (5.24):  Replacing  (1,  m,  n,  s)  by  (n  + 1 , 1 , n,  n),  we  get 


1 

n+1 


1 

n+1 


M)n 


n — 1 

-1 


= 0. 


For  a minute 
/ thought  we'd 
have  to  punt. 


Zero,  eh?  After  all  that  work?  Let’s  check  it  when  n = 2:  (q)  (q)  y Q)  (^)  j + 

(■)(•).!  = i - f = O'  Itchecks- 

Just  for  the  heck  of  it,  let’s  explore  our  other  option,  negating  the  upper 
index  of  (nkk): 


1 

n+1 


1 

n + 1 


Now  (5.23)  applies,  with  (l, m, n, s)  <—  (n  + 1,1,0,  -n  -1),  and 


1 

n+T 


n 


1 Vn 
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Try  binary  search: 
Replay  the  middle 
formula  first,  to  see 
if  the  mistake  was 
early  or  late. 


Hey  wait.  This  is  zero  when  n > 0,  but  it’s  1 when  n = 0.  Our  other 
path  to  the  solution  told  us  that  the  sum  was  zero  in  all  cases!  What  gives? 
The  sum  actually  does  turn  out  to  be  1 when  n = 0,  so  the  correct  answer  is 
‘[ri  = 0]’.  We  must  have  made  a mistake  in  the  previous  derivation. 

Let’s  do  an  instant  replay  on  that  derivation  when  p,  = 0,  in  order  to  see 
where  the  discrepancy  first  arises.  Ah  yes;  we  fell  into  the  old  trap  mentioned 
earlier:  We  tried  to  apply  symmetry  when  the  upper  index  could  be  negative! 
We  were  not  justified  in  replacing  (n^k)  by  (n^k)  when  k ranges  over  all 
integers,  because  this  converts  zero  into  a nonzero  value  when  k < — n.  (Sorry 
about  that.) 

The  other  factor  in  the  sum,  (£+]),  turns  out  to  be  zero  when  k < — n, 
except  when  Tl  = 0 and  k = -1.  Hence  our  error  didn’t  show  up  when  we 
checked  the  case  n = 2.  Exercise  6 explains  what  we  should  have  done. 

Problem  7:  A new  obstacle. 

This  one’s  even  tougher;  we  want  a closed  form  for 


L 

kso 


n 


m.  + 2k 


2k\  (—1  )k 
k ) k+1 


integers  m,  n > 0. 


If  m were  0 we’d  have  the  sum  from  the  problem  we  just  finished.  But  it’s 
not,  and  we’re  left  with  a real  mess-nothing  we  used  in  Problem  6 works 
here.  (Especially  not  the  crucial  first  step.) 

However,  if  we  could  somehow  get  rid  of  the  m,  we  could  use  the  result 
just  derived.  So  our  strategy  is:  Replace  by  a sum  of  terms  like  (l^kk) 

for  some  nonnegative  integer  l;  the  summand  will  then  look  like  the  summand 
in  Problem  6,  and  we  can  interchange  the  order  of  summation. 

What  should  we  substitute  for  (n^kk)'?  A painstaking  examination  of  the 
identities  derived  earlier  in  this  chapter  turns  up  only  one  suitable  candidate, 
namely  equation  (5.26)  in  Table  169.  And  one  way  to  use  it  is  to  replace  the 
parameters  (l,  m,  ri,  q,k)  by  (n  Tk  1,2k,  m — 1 ,0,  j),  respectively: 


L 

kS:0 


n + k 
m + 2k 


-Dk 


k+1 


L if 

k^O  0^j^u+k-1 


n + k — 1 — j 
2k 


m — 1 


L 

i^o 


m— 1 


I 

k^j-n+] 

k^0 


n + k — 1 — j 
2k 


2k\(-1)k 
k)  k+1 

2k\  (-1  )k 


k ) k+  1 


In  the  last  step  we’ve  changed  the  order  of  summation,  manipulating  the 
conditions  below  the  ^T’s  according  to  the  rules  of  Chapter  2. 
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We  can’t  quite  replace  the  inner  sum  using  the  result  of  Problem  6, 

because  it  has  the  extra  condition  k 1>  j n + 1.  But  this  extra  condition 

is  superfluous  unless  j — n + 1 >0;  that  is,  unless  j n.  And  when  j ^ n, 
the  first  binomial  coefficient  of  the  inner  sum  is  zero,  because  its  upper  index 
is  between  0 and  k 1,  thus  strictly  less  than  the  lower  index  2k.  We  may 
therefore  place  the  additional  restriction  j < n on  the  outer  sum,  without 
affecting  which  nonzero  terms  are  included.  This  makes  the  restriction  k 

j n + 1 superfluous,  and  we  can  use  the  result  of  Problem  6.  The  double 

sum  now  comes  tumbling  down: 


i>0 


m — 1 


z 

k^O 


n + k-1  — A /2k\  (-l)k 
2k  ) l k / k + 1 


= z 

o^i<r 

= z 

0$i<n 


m — 1 


m — 1 


L 


n + k-l-jW2kV-l) 


[n  - 1 - j = 0]  - 


2k  J\  k J k + 1 
n-  T 


m-  1 


The  inner  sums  vanish  except  when  j = n — 1 , so  we  get  a simple  closed  form 
as  our  answer. 


Problem  8:  A different  obstacle. 

Let’s  branch  out  from  Problem  6 in  another  way  by  considering  the  sum 

. *-/n  + k\/2k\  |-])k 

S"  = M 2k  JUJk  + l+m'  integers  m,  n ^ 0. 

Again,  when  m = 0 we  have  the  sum  we  did  before;  but  now  the  m occurs 
in  a different  place.  This  problem  is  a bit  harder  yet  than  Problem  7,  but 
(fortunately)  we’re  getting  better  at  finding  solutions.  We  can  begin  as  in 
Problem  6, 


S 


m 


k + 1 + m ' 


Now  (as  in  Problem  7)  we  try  to  expand  the  part  that  depends  on  m into 
terms  that  we  know  how  to  deal  with.  When  m was  zero,  we  absorbed  k + 1 
into  if  m >0,  we  can  do  the  same  thing  if  we  expand  1 /(k  + 1 + m)  into 
absorbable  terms.  And  our  luck  still  holds:  We  proved  a suitable  identity 


r+1 

r + 1 - m 


integer  m 1>  0, 
r£  {0,1,...,  m-1}. 


(5,33) 
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in  Problem  1.  Replacing  r by  -k  2 gives  the  desired  expansion. 


S 


m 


Now  the  (k  -p  I)-1  can  be  absorbed  into  (£),  as  planned.  In  fact,  it  could 
also  be  absorbed  into  (_V  2)  1 . Double  absorption  suggests  that  even  more 
cancellation  might  be  possible  behind  the  scenes.  Yes-expanding  everything 
in  our  new  summand  into  factorials  and  going  back  to  binomial  coefficients 
gives  a formula  that  we  can  sum  on  k: 


They  expect  us  to 
check  this 
on  a shed;  of 
scratch  paper. 


m!  n! 

(m  + n+  T 


m!  n! 

( m + n + 1 ) ! 


D 

>3=0 

L< 

>3=0 


1)j 


'( 


m + n + 1 
n + 1 + j 


hr  + n + 1 
n + 1 + j 


I 

k 

j 

n 


n + 1 + j 
k + j + 1 


-n-  1 
k 


The  sum  over  all  integers  j is  zero,  by  (5.24).  Hence  — Sm  is  the  sum  for  j < 0. 
To  evaluate  — Sm  for  j < 0,  let’s  replace  j by  -k  — 1 and  sum  for  k ;>  0: 


s 


m 


m!  n!  Tf  nkfm  + n + 1) 

( m + n + I ) ! A-1  ^ n-k  J{  n J 

m-n-  y,_nn-k  /m  + n+1\/k-n-1 

(m  + n+1)!  ^ ’ { k J{  n 

m!  n!  5-f_1ric/'m  + n + 1V2Tl-k>\ 

( m + n + 1 ) ! \ k /\n/ 

m!  n!  rt_n^m  + n + 1V2n_k^ 

(m  + n + 1)!  V k J{  n J' 


Finally  (5.25)  applies,  and  we  have  our  answer: 


Sm  = (-11 


m!  n! 


m 


n_n_-n-t 


(—1  )nm-m- 


(m  + n+  1 ) ! y n y 
Whew;  we’d  better  check  it.  When  n = 2 we  find 


c 1 6 6 _ m(m-  1) 

m - m+1  m + 2"^m  + 3 _ (m  + l)(m  + 2)(m  + 3) 

Our  derivation  requires  nr  to  be  an  integer,  but  the  result  holds  for  all  real  nr, 
because  (nr + 1 )n+1  Sm  is  a polynomial  in  nr  of  degree  <:  n. 
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5.3  TRICKS  OF  THE  TRADE 

Let’s  look  next  at  three  techniques  that  significantly  amplify  the 
methods  we  have  already  learned. 

Trick  1 • Going  halves. 

Many  of  our  identities  involve  an  arbitrary  real  number  r.  When  r has 
the  special  form  “integer  minus  one  half,’’  the  binomial  coefficient  (£)  can  be 
written  as  a quite  different-looking  product  of  binomial  coefficients.  This  leads 
to  a new  family  of  identities  that  can  be  manipulated  with  surprising  ease. 
One  way  to  see  how  this  works  is  to  begin  with  the  duplication  formula 

rk  (r  _ I)k  - (2r)— /22k  , integer  k ^ 0.  (5.34) 

This  identity  is  obvious  if  we  expand  the  falling  powers  and  interleave  the 
factors  on  the  left  side: 


t(t  - i)(r  - 1)(r  — f) ...  (r-k  + l)(r-k  + i) 

(2r)(2r  — 1). . . (2r  — 2k  + 1) 


2-2- ...  -2 

Now  we  can  divide  both  sides  by  k!2,  and  we  get 
'r\  fr-  1/2\  /2r\/2k 


2k/ V k 


■>2k 


integer  k. 


If  we  set  k = r = n,  where  n is  an  integer,  this  yields 


n — 1/2 
n 


2 n 
n 


?2n 


integer  n. 


And  negating  the  upper  index  gives  yet  another  useful  formula, 

integer  n. 

For  example,  when  n = 4 we  have 


-1/2 

U 


t)T; 


(5.35) 


(5-36) 


(5-37) 


(~VA  _ (-1/2) (-3/2) (-5/2) (-7/2) 

V 4 ) “ / \ 4! 

V-12/4  1 -3-5-7 

■ 1 in 

/-I  \4  1-3-5-7-2-4-6-8  _ /-1\4/8\ 

= V 4 ) 1 -2-3-4-1  -2-3-4  _ V 4 ) W ' 

Notice  how  we’ve  changed  a product  of  odd  numbers  into  a factorial. 


This  should  really 
be  called  Trick  1/2 


, , we  halve.  . 
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Identity  (5.35)  has  an  amusing  corollary.  Let  r = ju.,  and  take  the  sum 
over  all  integers  k.  The  result  is 


n — 1/2N 
Ln/2J  l> 


integer  n ^ 0 


(5-38) 


by  (5-23),  because  either  n/2  or  (n  1 )/2  is  |n/2J , a nonnegative  integer! 
We  can  also  use  Vandermonde’s  convolution  (5.27)  to  deduce  that 


integer  n ^ 0. 


Plugging  in  the  values  from  (5.37)  gives 


(-1)n  /2k\  /2n  — 2k\ 
4n  \kj\n  — k / ’ 


(2t?) 


this  is  what  sums  to  (~1)n.  Hence  we  have  a remarkable  property  of  the 
“middle"  elements  of  Pascal’s  triangle: 


4n  , integer  n ^ 0. 


(5.39) 


For  example,  Q © +(2)  (4)  + (4)  (>)  + (*)  (J)  = 1-20+2-6+6-2+20-1  = 64  - 4*. 

These  illustrations  of  our  first  trick  indicate  that  it’s  wise  to  try  changing 
binomial  coefficients  of  the  form  (2k)  into  binomial  coefficients  of  the  form 
(n  k/2)’  where  n is  some  appropriate  integer  (usually  0, 1 , or  k);  the  resulting 
formula  might  be  much  simpler. 

Trick  2:  High-order  differences. 

We  saw  earlier  that  it’s  possible  to  evaluate  partial  sums  of  the  series 
(k)  (“1  )k,  but  not  of  the  series  (k) . It  turns  out  that  there  are  many  important 
applications  of  binomial  coefficients  with  alternating  signs,  (£)  (-1  )k.  One  of 
the  reasons  for  this  is  that  such  coefficients  are  intimately  associated  with  the 
difference  operator  A defined  in  Section  2.6. 

The  difference  Af  of  a function  f at  the  point  x is 


Af(x)  =f(x  + 1)  -f(x); 
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if  we  apply  A again,  we  get  the  second  difference 

A2f(x)  = Af(x  + 1)  Af(x)  = (f(x+2)  - f(x+l ))  - (f(x+1)  -f(x)) 

= f(x  + 2)  — 2f(x  + 1)  + f(x) , 

which  is  analogous  to  the  second  derivative.  Similarly,  we  have 

A3  f(x)  = f(x  + 3)  — 3f(x  + 2)  + 3f  (x  + 1 ) — f(x) ; 

A4  f (x)  = f (x  + 4)  - 4f (x  + 3)  + 6f (x  + 2)  - 4f (x  + 1 ) + f (x) ; 

and  so  on.  Binomial  coefficients  enter  these  formulas  with  alternating  signs. 
In  general,  the  nth  difference  is 

Anf(x)  = 11  (^)H)n_kf(x+  k),  integer  n 0.  (5.40) 

This  formula  is  easily  proved  by  induction,  but  there’s  also  a nice  way  to  prove 
it  directly  using  the  elementary  theory  of  operators,  Recall  that  Section  2.6 
defines  the  shift  operator  E by  the  rule 

Ef  ( x ) = f ( x+l ) ; 


hence  the  operator  A is  E 1 , where  1 is  the  identity  operator  defined  by  the 
rule  1 f ( x ) = f ( x ) , By  the  binomial  theorem, 

A”  = (E-l)”  = ttEk(-1)n-k, 


This  is  an  equation  whose  elements  are  operators;  it  is  equivalent  to  (5.40), 
since  Ek  is  the  operator  that  takes  f ( x ) into  f ( x + k). 

An  interesting  and  important  case  arises  when  we  consider  negative 
falling  powers.  Let  f ( x ) = (x  — 1 )zl  = I / x . Then,  by  rule  (2.45),  we  have 
Af(x)  = (- 1 )(x-  1)^.,  A2f(x)  = (-l)(-2)(x-  1)—,  and  in  general 


Aa((x  - 1)— ) = (-1)^(x-1)^I 


(-1)n 


n! 

x(x  + 1)  . (x  + n) 


Equation  (5.40)  now  tells  us  that 


(-1)k 
x + k 


n! 

x(x  + l)...(x  + n) 
■ l 


x + n 


x ^ {0,  — 1, . . . , — n}. 


(5-4i) 
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For  example, 

14  6 4 1 

‘ 4 + 

x x + 1 x + 2 x + T x + 4 

4!  _ i / /x  + 4\ 

= x(x  + l)(x  + 2)(x  + 3)(x  + 4)  " /XV  4 )' 

The  sum  in  (5.41)  is  the  partial  fraction  expansion  of  n!/(x(x+l)  . . . (x+n)). 

Significant  results  can  be  obtained  from  positive  falling  powers  too.  If 
f(x)  is  a polynomial  of  degree  d,  the  difference  Af(x)  is  a polynomial  of  degree 
d-1  ; therefore  Ad  f(x)  is  a constant,  and  An  f (x)  = 0 if  n > d.  This  extremely 
important  fact  simplifies  many  formulas. 

A closer  look  gives  further  information:  Let 

f (x)  z:  adxd  + Qd-ixd_1  H b aix1  + a0x° 

be  any  polynomial  of  degree  d.  We  will  see  in  Chapter  6 that  we  can  express 
ordinary  powers  as  sums  of  falling  powers  (for  example,  x2  = x-  + xl);  hence 
there  are  coefficients  bd,  bd-i,  ■ • • , b],  bo  such  that 

f(X)  = bdx-  + bd-ix— H bbixl  + box-. 

(It  turns  out  that  bd  = ad  and  bo  = Clo,  but  the  intervening  coefficients  are 
related  in  a more  complicated  way.)  Let  = k!  bk  for  0 ^ k ^ d.  Then 

f(x)  = C d ( ; ) +Cd-l(d_-,)  +'-'  + Cl  Q +CoQ  ; 

thus,  any  polynomial  can  be  represented  as  a sum  of  multiples  of  binomial 
coefficients.  Such  an  expansion  is  called  the  Newton  series  of  f ( X ) , because 
Isaac  Newton  used  it  extensively. 

We  observed  earlier  in  this  chapter  that  the  addition  formula  implies 


Therefore,  by  induction,  the  nth  difference  of  a Newton  series  is  very  simple: 

AUf(X)  = Cd(d-0+Cd-'(d^n)+'--  + Cl(i!n)+co(4x)- 

If  we  now  set  x = 0,  all  terms  Ck(k*n)  on  the  right  side  are  zero,  except  the 
term  with  k-n  = 0;  hence 


An  f (0) 


cn  , if  n ^ d; 
0 , if  n > d. 
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The  Newton  series  for  f(x)  is  therefore 

f(x)  = Adf(0)o*  + Ad~,f(0](d*  .,)+■■ -+Af(0)(*)  +f(0)Q 

For  example,  suppose  f(x)  = x3.  It’s  easy  to  calculate 

f(0)  = 0,  f(l)  = 1,  f(2)  = 8,  f(3)  = 27; 

Af(0)  = 1,  Af(1 ) = 7,  Af(2)  = 19; 

A2f(0)  = 6,  A2  f(l ) = 12; 

A3  f(0)  = 6. 

So  the  Newton  series  is  x3  = 6(^)  + 6(£)  + 1 (*)  + 0(g) . 

Our  formula  A”  f(0)  = Cn  can  also  be  stated  in  the  following  way,  using 
(5.40)  with  x = 0: 


Here  (co,  C; , C2, . . . ) is  an  arbitrary  sequence  of  coefficients;  the  infinite  sum 
c0  (0)  + ci  (V)  + c2  (2)  + ' ‘ • is  actually  finite  for  all  k ^ 0,  so  convergence  is  not 
an  issue.  In  particular,  we  can  prove  the  important  identity 

Y_  (-1)k(a0  + a,k  + • ■ ■ + anku)  = (-1)nrt!an, 

integer  n ^ 0,  (5-42) 

because  the  polynomial  cto  + Q]  k + . . . + Clnkn  can  always  be  written  as  a 
Newton  series  c0Q  + C]  (k)  + • • • + cu(k)  with  cn  - n!  a,. 

Many  sums  that  appear  to  be  hopeless  at  first  glance  can  actually  be 
summed  almost  trivially  by  using  the  idea  of  nth  differences.  For  example, 
let’s  consider  the  identity 

Z (k)  (T  n^)  H )k  = SU  ’ integer  n ^ 0.  (5.43) 

This  looks  very  impressive,  because  it’s  quite  different  from  anything  we’ve 
seen  so  far.  But  it  really  is  easy  to  understand,  once  we  notice  the  telltale 
factor  (k)(~1)k  in  the  summand,  because  the  function 

f(ic)  = (r;sk)  = ^(-Dnsnkn+-.-  = M)nsnQ+-- 
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(Since  E =1 +A, 

Ex= 

and  Exg(a)  = 

g(a  + x).j 


is  a polynomial  in  k of  degree  n,  with  leading  coefficient  (-1  )nsn/u!.  There- 
fore (5.43)  is  nothing  more  than  an  application  of  (5.42). 

We  have  discussed  Newton  series  under  the  assumption  that  f(x)  is  a 
polynomial.  But  we’ve  also  seen  that  infinite  Newton  series 


make  sense  too,  because  such  sums  are  always  finite  when  x is  a nonnegative 
integer.  Our  derivation  of  the  formula  Anf(0)  = cn  works  in  the  infinite  case, 
just  as  in  the  polynomial  case;  so  we  have  the  general  identity 

f (x)  = f (0)  Q + Af(0)  + A2  f (0)  Q)  + A3  f (0)  Q + • • • , 

integer  x ^ 0.  (5-44) 


This  formula  is  valid  for  any  function  f(x)  that  is  defined  for  nonnegative 
integers  x.  Moreover,  if  the  right-hand  side  converges  for  other  values  of  x, 
it  defines  a function  that  “interpolates”  f(x)  in  a natural  way.  (There  are 
infinitely  many  ways  to  interpolate  function  values,  so  we  cannot  assert  that 
(5.44)  is  true  for  all  x that  make  the  infinite  series  converge.  For  example, 
if  we  let  f(x)  = sin(7tx),  we  have  f(x)  = 0 at  all  integer  points,  so  the  right- 
hand  side  of  (5.44)  is  identically  zero;  but  the  left-hand  side  is  nonzero  at  all 
noninteger  x.) 

A Newton  series  is  finite  calculus’s  answer  to  infinite  calculus’s  Taylor 
series.  Just  as  a Taylor  series  can  be  written 


g(a+x)  = 


vW  2 , 1.3  , 

0!  1 T 2!  + 3!  + 


the  Newton  series  for  f(x)  = g(  a + x)  can  be  written 


g(a  + x)  = 


0! 


) 0 , A9(q 
+ 1! 


1 A2g(t 


-x-  + 


A3 


3!' 


x^  + 


(5.45) 


(This  is  the  same  as  (5.44),  because  Anf(0)=  Ang(a)  for  all  n 0 when 
f(x)  = g(  a + x).)  Both  the  Taylor  and  Newton  series  are  finite  when  g is  a 
polynomial,  or  when  x = 0;  in  addition,  the  Newton  series  is  finite  when  x is  a 
positive  integer.  Otherwise  the  sums  may  or  may  not  converge  for  particular 
values  of  x.  If  the  Newton  series  converges  when  x is  not  a nonnegative  integer, 
it  might  actually  converge  to  a value  that’s  different  from  g (a  + x),  because 
the  Newton  series  (5.45)  depends  only  on  the  spaced-out  function  values  g(a), 

g(a+l),g(a+2),.  . . . 
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One  example  of  a convergent  Newton  series  is  provided  by  the  binomial 
theorem.  Let  g(x)  = (1  + z)x,  where  z is  a fixed  complex  number  such  that 
|zj  < 1.  Then  Ag(x)  = (1  + z)  *+'  - (1  + z)x  = z(1  + z)x,  hence  Ang(x)  = 
zu(  1 + z)x.  In  this  case  the  infinite  Newton  series 


g(ct  + x)  = ^Ang(a)Q  = (1  + z)Q^ 

converges  to  the  “correct”  value  (1  +z)Q+x,  for  all  x. 

James  Stirling  tried  to  use  Newton  series  to  generalize  the  factorial  func- 
tion to  noninteger  values.  First  he  found  coefficients  Sn  such  that 


is  an  identity  for  x = 0,  x = 1,  x = 2,  etc.  But  he  discovered  that  the  resulting 
series  doesn’t  converge  except  when  x is  a nonnegative  integer.  So  he  tried 
again,  this  time  writing 


Now  A(lnx!)  = ln(x  + 1 )!  — lnx!  = ln(x  + 1 ),  hence 


sn  = An(lnx!)|x=0 

= An-1(ln(x  + l))|x=0 


)n“,-kln(k  + 1) 


by  (5.40).  The  coefficients  are  therefore  so  = Si  = 0;  S2  = bi2;  S3  = ln3 
2 ln2  = In  S4  = ln4-3  ln3+3  ln2  = In  etc.  In  this  way  Stirling  obtained 
a series  that  does  converge  (although  he  didn’t  prove  it);  in  fact,  his  series 
converges  for  all  x > -1.  He  was  thereby  able  to  evaluate  satisfactorily. 
Exercise  88  tells  the  rest  of  the  story. 

Trick  3:  Inversion. 

A special  case  of  the  rule  (5.45)  we’ve  just  derived  for  Newton’s  series 
can  be  rewritten  in  the  following  way: 


fl(n)  = Z (k)(-])kf(k)  ^ f(n)  = Z (k)  H)kg(1c).  (5.48) 


"Forasmuch  as 
these  terms  increase 
very  fast,  their 
differences  will 
make  a diverging 
progression,  which 
hinders  the  ordinate 
of  the  parabola 
from  approaching  to 
the  truth;  therefore 
in  this  and  the  like 
cases,  / interpolate 
the  logarithms  of 
the  terms,  whose 
differences  consti- 
tute a series  swiftly 
converging.  ” 

—J.  Stirling  [281] 


(Proofs  of  conver- 
gence were  not 
invented  until  the 
nineteenth  century.) 
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Invert  this: 
‘zinb  ppo'. 


This  dual  relationship  between  f and  g is  called  an  inversion  formula;  it’s 
rather  like  the  Mobius  inversion  formulas  (4.56)  and  (4.61)  that  we  encoun- 
tered in  Chapter  4.  Inversion  formulas  tell  us  how  to  solve  “implicit  recur- 
rences,” where  an  unknown  sequence  is  embedded  in  a sum. 

For  example,  g(n)  might  be  a known  function,  and  f(n)  might  be  un- 
known;andwemighthavefoundawaytoprovethatg(n)  = (k)  (—1  )kf(k). 

Then  (5.48)  lets  us  express  f(n)  as  a sum  of  known  values. 

We  can  prove  (5.48)  directly  by  using  the  basic  methods  at  the  beginning 
of  this  chapter.  If  g(n)  = (k)  (— 1 )kf(k)  f°r  all  n ^ 0,  then 


= [n  — j=0]  = f ( n ) . 


The  proof  in  the  other  direction  is,  of  course,  the  same,  because  the  relation 
between  f and  g is  symmetric. 

Let’s  illustrate  (5.48)  by  applying  it  to  the  “football  victory  problem”: 
A group  of  n fans  of  the  winning  football  team  throw  their  hats  high  into  the 
air.  The  hats  come  back  randomly,  one  hat  to  each  of  the  n fans.  How  many 
ways  H(n,  k)  are  there  for  exactly  k fans  to  get  their  own  hats  back? 

For  example,  if  n = 4 and  if  the  hats  and  fans  are  named  A,  B,  C,  D, 
the  4!  = 24  possible  ways  for  hats  to  land  generate  the  following  numbers  of 
rightful  owners: 


ABCD 

4 

BACD 

2 

CABD 

DABC 

0 

ABDC 

2 

BADC 

0 

CADB 

0 

DACB 

1 

ACBD 

2 

BCAD 

1 

CBAD 

2 

DBAC 

1 

ACDB 

1 

BCDA 

0 

CBDA 

DBCA 

2 

ADBC 

1 

BDAC 

0 

CDAB 

0 

DCAB 

0 

ADCB 

2 

BDCA 

! 

CDBA 

0 

DCBA 

0 

Therefore  h.(4,4)  = 1;  h.(4,3)  = 0;  h.(4,2)  = 6;  h.(4,1)  = 8;  h.(4,0)  = 9. 
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We  can  determine  h.(n,  k)  by  noticing  that  it  is  the  number  of  ways  to 
choose  k lucky  hat  owners,  namely  (£),  times  the  number  of  ways  to  arrange 
the  remaining  n-k  hats  so  that  none  of  them  goes  to  the  right  owner,  namely 
h(n  k,  0).  A permutation  is  called  a derangement  if  it  moves  every  item, 
and  the  number  of  derangements  of  n objects  is  sometimes  denoted  by  the 
symbol  ‘nj’,  read  “n  subfactorial!’  Therefore  h(n  ™ k,  0)  = (n  — k)j,  and  we 
have  the  general  formula 

h(n,k)  = ^(n-k)j. 

(Subfactorial  notation  isn’t  standard,  and  it’s  not  clearly  a great  idea;  but 
let’s  try  it  awhile  to  see  if  we  grow  to  like  it.  We  can  always  resort  to  ‘Dn’  or 
something,  if  ‘nj’  doesn’t  work  out.) 

Our  problem  would  be  solved  if  we  had  a closed  form  for  nj,  so  let’s  see 
what  we  can  find.  There’s  an  easy  way  to  get  a recurrence,  because  the  sum 
of  h.(n,  k)  for  all  k is  the  total  number  of  permutations  of  n hats: 


n! 


Y_  H(n,k) 


(n-k)j 

kj , integer  n ^ 0. 


(5-49) 


(We’ve  changed  k to  n • 

- k and  ( 

P 

1 P 

O 

T?  p 

I in  the  last  step.)  With 

implicit  recurrence 

we  can 

compute  all  the  h.(n, 

k)’s  we  like: 

n 

H(n,  0) 

H(n,  1 

) H(n,2) 

K(n,3)  h.(n,4)  h.(n,5)  h(n,  6) 

0 

1 

0 

i 

2 

1 

0 

1 

3 

2 

3 

0 

1 

4 

9 

8 

6 

0 

1 

5 

44 

45 

20 

10 

0 1 

6 

265 

264 

135 

40 

15  0 1 

For  example,  here’s  how  the  row  for  n = 4 can  be  computed:  The  two  right- 
most entries  are  obvious -there’s  just  one  way  for  all  hats  to  land  correctly, 
and  there’s  no  way  for  just  three  fans  to  get  their  own.  (Whose  hat  would  the 
fourth  fan  get?)  When  k = 2 and  k = 1,  we  can  use  our  equation  for  h(n,  k), 
giving  h(4,2)=  Q)H(2,0)=  6-1=  6,  and  h(4,l)=  (})h(3,0)  = 4-2  = 8.  We 
can’t  use  this  equation  for  h.(4, 0);  rather,  we  can,  but  it  gives  us  K(4,0)  = 
(q)H(4,0),  'hi  ic  IS  true  but  useless.  Taking  another  tack,  we  can  use  the 
relation  H(4, 0)+8+6  + O + l =4!  to  deduce  that  h(4, 0)  = 9;  this  is  the  value 
of  4j.  Similarly  nj  depends  on  the  values  of  kj  for  k < n. 


The  art  of  math- 
ematics, as  of  life, 
is  knowing  which 
truths  are  useless. 
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Baseball  fans:  .367 
is  also  Ty  Cobb' s 
lifetime  batting 
average,  the  a//-time 
record.  Can  this  be 
a coincidence? 

(Hey  wait,  you're 
fudging.  Cobb  ’s 
average  was 
4191/11429  ss 
.366699,  while 
1/e  « .3 67879. 

But  maybe  if 
Wade  Boggs  has 
a few  really  good 
seasons.  . . J 


How  can  we  solve  a recurrence  like  (5.49)?  Easy;  it  has  the  form  of  (5.48), 
with  g(n)  = n!  and  f(k)  = (— 1)kkj.  Hence  its  solution  is 

ni  = H)n^(^)M)kk!. 

Well,  this  isn’t  really  a solution;  it’s  a sum  that  should  be  put  into  closed  form 
if  possible.  But  it’s  better  than  a recurrence.  The  sum  can  be  simplified,  since 
k!  cancels  with  a hidden  k!  in  (k),  so  let’s  try  that:  We  get 


ni=  X 


TV! 


0<k<n 


(n-k)! 


(-1 


\n+k  _ 


= n!  Y 


(-11 


O^k^n 


k! 


(5,50) 


The  remaining  sum  converges  rapidly  to  the  number  ^k>0  (-1  )k/k!  = e-1. 
In  fact,  the  terms  that  are  excluded  from  the  sum  are 


, V H)k  = M)n+1  y k (n  + 1)! 

k!  n+1  (k  + n + 1)! 

k>n  k^O 

= (~Dn+1  (]  1 1 

n + 1 V n + 2+  (n  + 2)(n  + 3) 


and  the  parenthesized  quantity  lies  between  1 and  1 Therefore 

the  difference  between  nj  and  n!/e  is  roughly  1/n  in  absolute  value;  more 
precisely,  it  lies  between  1 /(n  + 1)  and  1 /(n  + 2).  But  nj  is  an  integer. 
Therefore  it  must  be  what  we  get  when  we  round  rt!/e  to  the  nearest  integer, 
if  n > 0.  So  we  have  the  closed  form  we  seek: 


TLi 


[n  — 0] . 


(5-5i) 


This  is  the  number  of  ways  that  no  fan  gets  the  right  hat  back.  When 
n is  large,  it’s  more  meaningful  to  know  the  probability  that  this  happens. 
If  we  assume  that  each  of  the  n!  arrangements  is  equally  likely-  because  the 
hats  were  thrown  extremely  high-  this  probability  is 


nj 

tv! 


n!/e  + 0(1) 

rv  — 

n!  e 


.367..  . 


So  when  n gets  large  the  probability  that  all  hats  are  misplaced  is  almost  37%. 

Incidentally,  recurrence  (5.49)  for  subfactorials  is  exactly  the  same  as 
(5.46),  the  first  recurrence  considered  by  Stirling  when  he  was  trying  to  gen- 
eralize the  factorial  function.  Hence  Sk  = kj.  These  coefficients  are  so  large, 
it’s  no  wonder  the  infinite  series  (5.46)  diverges  for  noninteger  x. 

Before  leaving  this  problem,  let’s  look  briefly  at  two  interesting  patterns 
that  leap  out  at  us  in  the  table  of  small  h,(n,  k).  First,  it  seems  that  the  num- 
bers 1,  3,  6,  10,  15,  . . . below  the  all-0  diagonal  are  the  triangular  numbers. 
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This  observation  is  easy  to  prove,  since  those  table  entries  are  the  h.(n,n— 2)’s, 
and  we  have 

h(n-n-2)  = (n-l)2i  = (2) 

It  also  seems  that  the  numbers  in  the  first  two  columns  differ  by  f I . Is 
this  always  true?  Yes, 


H(n, 0)  - H(n,  1 ) = uj-n(n-1)j 

M)k 


= n 


n!  - 


» L 

O^k^n 

(-1)n 

n! 


k! 

- f-nn 


n n 


In  other  words,  rij  = n(n  — 1 ) j + (-1)“.  This  is  a much  simpler  recurrence 
for  the’  derangement  numbers  than  we  had  before. 

Now  let’s  invert  something  else.  If  we  apply  inversion  to  the  formula 


y M(-1)k  = 1_/x  + n\ 
^yk/x  + k x\  n / 

that  we  derived  in  (5.41),  we  find 


This  is  interesting,  but  not  really  new.  If  we  negate  the  upper  index  in  (x~k) , 
we  have  merely  discovered  identity  (5.33)  again. 


5.4  GENERATING  FUNCTIONS 

We  come  now  to  the  most  important  idea  in  this  whole  book,  the 
notion  of  a generating  function.  An  infinite  sequence  (qq,  aj , Q2,  • • • ) that 
we  wish  to  deal  with  in  some  way  can  conveniently  be  represented  as  a power 
series  in  an  auxiliary  variable  z, 

A(z)  = a0  + cii  z + a2z2 -| = Vakzk.  (5.52) 

k^O 

It’s  appropriate  to  use  the  letter  z as  the  name  of  the  auxiliary  variable,  be- 
cause we’ll  often  be  thinking  of  z as  a complex  number.  The  theory  of  complex 
variables  conventionally  uses  in  its  formulas;  power  series  (a.k.a.  analytic 
functions  or  holomorphic  functions)  are  central  to  that  theory. 


But  inversion  is  the 
source  of  smog. 
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We  will  be  seeing  lots  of  generating  functions  in  subsequent  chapters. 
Indeed,  Chapter  7 is  entirely  devoted  to  them.  Our  present  goal  is  simply  to 
introduce  the  basic  concepts,  and  to  demonstrate  the  relevance  of  generating 
functions  to  the  study  of  binomial  coefficients. 

A generating  function  is  useful  because  it’s  a single  quantity  that  repre- 
sents an  entire  infinite  sequence.  We  can  often  solve  problems  by  first  setting 
up  one  or  more  generating  functions,  then  by  fooling  around  with  those  func- 
tions until  we  know  a lot  about  them,  and  finally  by  looking  again  at  the 
coefficients.  With  a little  bit  of  luck,  we’ll  know  enough  about  the  function 
to  understand  what  we  need  to  know  about  its  coefficients. 

If  A(z)  is  any  power  series  akzk,  we  will  find  it  convenient  to  write 

[zn]  A(z)  = a,,;  (5.53) 

in  other  words,  [zn]  A(z)  denotes  the  coefficient  of  zn  in  A(z). 

Let  A(z)  be  the  generating  function  for  (a0,  ai , Q2, . • . ) as  in  (5.52),  and 
let  B(z)  be  the  generating  function  for  another  sequence  (bo,  b)  , b2  , • • , )■  Then 
the  product  A(z)  B (z)  is  the  power  series 


(oo  + aiz+  CI2Z2  -I ) (bo  + biz  + b2z2  H — ) 

= a0bo  + (o0bi  + aib0)z  + (a0b2  + c^bi  + a2b0)z2  + • • • ; 

the  coefficient  of  zn  in  this  product  is 

n 

aobn  + Qi  bn_i  + . . . + flnbo  = akbn_k . 

k=0 

Therefore  if  we  wish  to  evaluate  any  sum  that  has  the  general  form 
u 

Cn  = dkbn-k  , (5.54) 

k=o 


and  if  we  know  the  generating  functions  A(z)  and  B(z) , we  have 
cn  = [zn]  A(z)B(z) 

The  sequence  (c,)  defined  by  (5.54)  is  called  the  convolution  of  the  se- 
quences (a,)  and  (b,);  two  sequences  are  “convolved”  by  forming  the  sums  of 
all  products  whose  subscripts  add  up  to  a given  amount.  The  gist  of  the  previ- 
ous paragraph  is  that  convolution  of  sequences  corresponds  to  multiplication 
of  their  generating  functions. 
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Generating  functions  give  us  powerful  ways  to  discover  and/or  prove 
identities.  For  example,  the  binomial  theorem  tells  us  that  (1  + z)T  is  the 
generating  function  for  the  sequence  ((J) , Q) , Q) , . . ): 


If  we  multiply  these  together,  we  get  another  generating  function: 

(1  +z)T(1  + z)s  = (1  +z)r+s . 

And  now  comes  the  punch  line:  Equating  coefficients  of  zn  on  both  sides  of 
this  equation  gives  us 


We’ve  discovered  Vandermonde’s  convolution,  (5.27)! 

That  was  nice  and  easy;  let’s  try  another.  This  time  we  use  ( 1 — z)r,  which 
is  the  generating  function  for  the  sequence  ( ( ■ 1 )n(^))  = ((J)  1 — (O’  (2)  ’ ' ' ' )• 
Multiplying  by  ( 1 + z)T  gives  another  generating  function  whose  coefficients 

we  know: 


(5.27)!  = 

(5.27) (4.27) 

(3.27) (2.27) 

(1.27) (0.27)!. 


(1  ■■  z)T(l  + z)1  = (1  z2)r 


Equating  coefficients  of  zn  now  gives  the  equation 

%{ 0 (n  - k) ,k  = M ’",2  (n/2) |n  even|  ■ ^ 

We  should  check  this  on  a small  case  or  two.  When  n = 3,  for  example, 
the  result  is 


Each  positive  term  is  cancelled  by  a corresponding  negative  term.  And  the 
same  thing  happens  whenever  n is  odd,  in  which  case  the  sum  isn’t  very 
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interesting.  But  when  n is  even,  say  n = 2,  we  get  a nontrivial  sum  that’s 
different  from  Vandermonde’s  convolution: 


So  (5.55)  checks  out  fine  when  n = 2.  It  turns  out  that  (5.30)  is  a special  case 
of  our  new  identity  (5.55). 

Binomial  coefficients  also  show  up  in  some  other  generating  functions, 
most  notably  the  following  important  identities  in  which  the  lower  index 
stays  fixed  and  the  upper  index  varies: 


If  you  have  a high- 

lighter  pen,  these 

1 y-  /rt  + 

ll_z)n+1  ~ kr'  n ) 

| zk  , integer  n j>  0 

(5-56) 

two  equations  have 

got  to  be  marked, 

zn  = f 

(1  — z)n+’  JL\n)  ’ 

integer  n 0. 

(5-57) 

The  second  identity  here  is  just  the  first  one  multiplied  by  zn,  that  is,  “shifted 
right’’  by  n places.  The  first  identity  is  just  a special  case  of  the  binomial 
theorem  in  slight  disguise:  If  we  expand  (1  z)^n_1  by  (5.13),  the  coefficient 

of  zk  is  (~V1)M)k1  which  can  be  rewritten  as  or  (n^k)  by  negating 

the  upper  index.  These  special  cases  are  worth  noting  explicitly,  because  they 
arise  so  frequently  in  applications. 

When  n = 0 we  get  a special  case  of  a special  case,  the  geometric  series: 


—1—  = 1 + z + z2  + z3  + 
1 — z 


This  is  the  generating  function  for  the  sequence  (1,1,1,...),  and  it  is  espe- 
cially useful  because  the  convolution  of  any  other  sequence  with  this  one  is 
the  sequence  of  sums:  When  = 1 for  all  k,  (5.54)  reduces  to 

n 

Cn  ~ Y-  Qk' 

k=0 


Therefore  if  A(z)  is  the  generating  function  for  the  summands  (cto,  Qi , Q2,  ■ ), 
then  A(z)/(1  — z)  is  the  generating  function  for  the  sums  (co,Ci  ,C2, . . . ). 

The  problem  of  derangements,  which  we  solved  by  inversion  in  connection 
with  hats  and  football  fans,  can  be  resolved  with  generating  functions  in  an 
interesting  way.  The  basic  recurrence 
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can  be  put  into  the  form  of  a convolution  if  we  expand  in  factorials  and 
divide  both  sides  by  n ! : 

i _ y-  \ (n  ~ fc)i 

^ k>!  ' 

The  generating  function  for  the  sequence  (<jn  jr.  jr>  • • • ) is  ez;  hence  if  we  let 
k£0  K' 

the  convolution/recurrence  tells  us  that 


— i = ezD(z). 

Solving  for  D(z)  gives 

D(z)  = ’ e- 

1 — z 

Equating  coefficients  of  zn  now  tells  us  that 

(-Dk 


- -z° -~z] + -z2  + ---) 

- 1 - z VO!  1!  +2!z  + ) 


- = y 

n'  /— 


k=0 


k! 


this  is  the  formula  we  derived  earlier  by  inversion. 

So  far  our  explorations  with  generating  functions  have  given  us  slick 
proofs  of  things  that  we  already  knew  how  to  derive  by  more  cumbersome 
methods.  But  we  haven’t  used  generating  functions  to  obtain  any  new  re- 
sults, except  for  (5.55).  Now  we’re  ready  for  something  new  and  more  sur- 
prising. There  are  two  families  of  power  series  that  generate  an  especially  rich 
class  of  binomial  coefficient  identities:  Let  us  define  the  generalized  binomial 
series  T>t  (z)  and  the  generalized  exponential  series  £t(z)  as  follows: 

■Bt(z)  = Jjtk)’^;  £t(z)=  £(tk+  l)k-’  (5.58) 

k:jO  Kl  k^O  *" 

It  can  be  shown  that  these  functions  satisfy  the  identities 

®t(z.)1_t  -Btfz)”1  = z;  £t(z)_tln£t(z)  = z.  (5.59) 


In  the  special  case  t = 0,  we  have 


*80  (z)  — 1 + z; 


£o(z)  = ez ; 
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this  explains  why  the  series  with  parameter  t are  called  “generalized”  bino- 
mials and  exponentials. 

The  following  pairs  of  identities  are  valid  for  all  real  r: 


t — ( tk  T-  r\  r j. 

w = L{  k hrrT 

k^O 


£t(z)r  = 


k^O 


(tk  + r)k-'  k 

“ kT“z  1 


(5-6o) 


®t(z) 


1 -t  + t®t(z]  1 k 

k^O 


= £(tk  + 1V; 


£t(z) 


1'zt£t(z)t  k>V  k; 


(tk  + r)k  k 


(5-6i) 


(When  tk  + r = 0,  we  have  to  be  a little  careful  about  how  the  coefficient 
of  zk  is  interpreted;  each  coefficient  is  a polynomial  in  r.  For  example,  the 
constant  term  of  £t(z)ris  r(0  +r)~',  and  this  is  equal  to  1 even  when  r = 0.  ) 
Since  equations  (5.60)  and  (5.61)  hold  for  all  r,  we  get  very  general  iden- 
tities when  we  multiply  together  the  series  that  correspond  to  different  powers 
r and  s.  For  example. 


®*(z) 


®t(z) 


1 -t  + t(BtU)  1 


/tk  + r\  r 


tk  -(-  r 


This  power  series  must  equal 


®t(z] 


r+s 


1 - 1 + tBtfz)-1 


L 


L*'L 

n>£)  k>0 


t n + r + 


tk  + r\  r /t(n  — k)  + s 
k / tk  + r l n — k 


hence  we  can  equate  coefficients  of  zn  and  get  the  identity 


L 


tk  + r\ /t(n  — k)  + s\  r 
k A n-k  / tk  + r 


tn  + r + s 
n 


integer  n. 


valid  for  all  real  r , s,  and  t.  When  t = 0 this  identity  reduces  to  Vander- 
monde’s convolution.  (If  by  chance  tk  -f  r happens  to  equal  zero  in  this 
formula,  the  denominator  factor  tk  -f  r should  be  considered  to  cancel  with 
the  tk+r  in  the  numerator  of  the  binomial  coefficient.  Both  sides  of  the  iden- 
tity are  polynomials  in  r , s,  and  t.)  Similar  identities  hold  when  we  multiply 
(Bt(z)rby  'Bt(z)s,  etc.;  Table  202  presents  the  results. 
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Table  202  General  convolution  identities,  valid  for  integer  n 0. 

y Ak  + r\  An  — tk  + s\  r _ An  + r + s\ 

AT  \ k / \ n — k / tk  + r — \ n ) ' 

L/tk  + r\  An  - tk  + s\  r s 

\ k / \ n — k /tk  + r tn  — tk  + s 

An  + r + s\  r + s 
\ n / tn  + r + s ' 


(5-62) 


(5-63) 


(tk  + r)k(tn  — tk  + s) 
21  ^(tk  + r)k(tn-tk+s) 


-k  r 


tk  + r 
n-k  r 


(tn  + r + s)n  . 


tk  + r tn  — tk  + s 
r + s 


( t n + r + s)r 


tn  + r + s 


(564) 


(5-65) 


We  have  learned  that  it’s  generally  a good  idea  to  look  at  special  cases  of 
general  results.  What  happens,  for  example,  if  we  set  t = 1?  The  generalized 
binomial  $1  (z)  is  very  simple-it’s  just 

®,(z)  = Yzk  = -L-; 

f—  I — Z 

k^O 

therefore  rB]  (z)  doesn’t  give  us  anything  we  didn’t  already  know  from  Van- 
dermonde’s convolution.  But  £1  (z)  is  an  important  function, 

£(z)  = ^^(k  + 1 )k_1  y = 1 + z + -z2  + -z3  + — z4  H (5.66) 


that  we  haven’t  seen  before;  it  satisfies  the  basic  identity 

£U)  - ez£(z)  (5.67) 

This  function,  first  studied  by  Eisenstein  [75],  arises  in  many  applications. 

The  special  cases  t = 2 and  t = -1  of  the  generalized  binomial  are  of 
particular  interest,  because  their  coefficients  occur  again  and  again  in  prob- 
lems that  have  a recursive  structure.  Therefore  it’s  useful  to  display  these 


Aha!  This  is  the 
iterated  power 
function. 

£ (In  r)  = z1" 
that  I’ve  often 
wondered  about, 


ZZZZzt.  . 
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series  explicitly  for  future  reference: 


‘£2(z) 


3_,(z) 


£2(z)r 


£ i(z)r 


£2(z)r 


Zi 

k 

/2k\  zk 

U/ 1 +k 

L 

Jr 

/2k  + 1 

V k 

) lk  - 
/ 1 + 2k  ~ 

1 

s ^ 
1 

Li 

k 

m 

zk 

i - k 

El 

k 

72k  1' 

V k , 

\ (~z)k 

I 1 - 2k 

1 + \7 1 + 4z 
2 

LI 

k 

72k  + r' 
V k / 

1 T 
1 2k  + r 

LI 

k 

7r  — k\ 
V k ) 

T k 

■ Z . 
r — lc 

LI 

k 

72k  + rN 
V k , 

k 

3 i(z)r+1 
\/1  +4z 


(5-68) 


(5-69) 

(5-7o) 

(5.71) 

(5-72) 

(5-73) 


The  coefficients  (2r[l)  of  £2  (z)  are  called  the  Catalan  numbers  Cn,  because 
Eugene  Catalan  wrote  an  influential  paper  about  them  in  the  1830s  [46].  The 
sequence  begins  as  follows: 


n 

0 

’ 2 

3 

4 

5 

6 

7 

8 

9 

10 

Cn 

1 1 2 

5 

14 

42 

‘32 

429 

‘430 

4862 

‘6796 

The  coefficients  of  lB_]  (z)  are  essentially  the  same,  but  there’s  an  extra  1 at  the 
beginning  and  the  other  numbers  alternate  in  sign:  (1,  1,  -*1,2,  —5,14,.  . . ). 
Thus  23_  1 (z)  = 1 + z£2(— z).  We  also  have  2 1 (z)  = 'B2(—  z] 

Let’s  close  this  section  by  deriving  an  important  consequence  of  (5.72) 
and  (5.73),  a relation  that  shows  further  connections  between  the  functions 
25- 1 (z)  and  (B2  ( — z.) : 


£-1  (z)n+1  -(-z)n+%(-z)n+1 

x/ 1 + 4z 
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This  holds  because  the  coefficient  of  zk  in  (—  z)n+1,B2(  — z)n+1/v/l  + 4z  is 


(-z)n+1'B2j-z)n+1 
\/l  +4z 


f-1)n+1[zk  n_ l 1 — ^ 


|Tl+l 


\/ 1 + 4z 

(-1  )n«(-l)k  nl  [z1 


n+l  t i -ik-  n l r„k-n.-li  ®2(z)  + 


/T^4z 


(-1. 
(-l)k 

n - k 
k 


k / 2(k  n 1)  -T  n -T  1 


k--n-  1 

2k  - n - 1 

k — TL  — 1 

. uL  ®-i  (^)n+1 


(-1) 


2k  — n — I 


VT+4z 


when  k > n.  The  terms  nicely  cancel  each  other  out.  We  can  now  use  (5.68) 
and  (5.69)  to  obtain  the  closed  form 


v-/n  — k\k  1 ((]  +v/T+4i\n+1  (\ ->/T+4z\n+1\ 

£(.  * > i- ) ■ C — 3 — ; )’ 

integer  n ^ 0.  (5-74) 


(The  special  case  z = -1  came  up  in  Problem  3 of  Section  5.2.  Since  the 
numbers  7 ( 1 ± 3 ) are  sixth  roots  of  unity,  the  sums  L k^n 

have  the  periodic  behavior  we  observed  in  that  problem.)  Similarly  we  can 
combine  (5.70)  with  (5.71)  to  cancel  the  large  coefficients  and  get 


I 

k<n 


n 


n 


n 


1+y/T+4zy  + ^1  -y/T+4zy 

integer  n > 0. 


(5-75) 
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The  methods  we’ve  been  applying  to  binomial  coefficients  are  very 
effective,  when  they  work,  but  we  must  admit  that  they  often  appear  to  be 
ad  hoc-more  like  tricks  than  techniques.  When  we’re  working  on  a problem, 
we  often  have  many  directions  to  pursue,  and  we  might  find  ourselves  going 
around  in  circles.  Binomial  coefficients  are  like  chameleons,  changing  their 
appearance  easily.  Therefore  it’s  natural  to  ask  if  there  isn’t  some  unifying 
principle  that  will  systematically  handle  a great  variety  of  binomial  coefficient 
summations  all  at  once.  Fortunately,  the  answer  is  yes.  The  unifying  principle 
is  based  on  the  theory  of  certain  infinite  sums  called  hypergeometric  series. 


They’re  even  more 
versatile  than 
chameleons;  we 
can  dissect  them 
and  put  them 
back  together  in 
different  ways. 
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Anything  that  has 
survived  for  cen- 
turies with  such 
awesome  notation 
must  be  really 
useful. 


The  study  of  hypergeometric  series  was  launched  many  years  ago  by  Eu- 
ler, Gauss,  and  Riemann;  such  series,  in  fact,  are  still  the  subject  of  consid- 
erable research.  But  hypergeometrics  have  a somewhat  formidable  notation, 
which  takes  a little  time  to  get  used  to. 

The  general  hypergeometric  series  is  a power  series  in  z with  m + n 
parameters,  and  it  is  defined  as  follows  in  terms  of  rising  factorial  powers: 


F 


(lli  • • ■ i Qm 

bi,  ...,bn 


22 


L 


bk 

k^OC  ' 


bkk! 


(5-76) 


To  avoid  division  by  zero,  none  of  the  b’s  may  be  zero  or  a negative  integer. 
Other  than  that,  the  a’s  and  b’s  may  be  anything  we  like.  The  notation 
‘F(qi,..  . ,a,„;  bi,..  . , bn;  z)’  is  also  used  as  an  alternative  to  the  two-line  form 
(5.76),  since  a one-line  form  sometimes  works  better  typographically.  The  a’s 
are  said  to  be  upper  parameters;  they  occur  in  the  numerator  of  the  terms 
of  F.  The  b’s  are  lower  parameters,  and  they  occur  in  the  denominator.  The 
final  quantity  z is  called  the  argument. 

Standard  reference  books  often  use  ‘mFn’  instead  of  ‘F’  as  the  name  of  a 
hypergeometric  with  m upper  parameters  and  n lower  parameters.  But  the 
extra  subscripts  tend  to  clutter  up  the  formulas  and  waste  our  time,  if  we’re 
compelled  to  write  them  over  and  over.  We  can  count  how  many  parameters 
there  are,  so  we  usually  don’t  need  extra  additional  unnecessary  redundancy. 

Many  important  functions  occur  as  special  cases  of  the  general  hypergeo- 
metric; indeed,  that’s  why  hypergeometrics  are  so  powerful.  For  example,  the 
simplest  case  occurs  when  m = n = 0:  There  are  no  parameters  at  all,  and 
we  get  the  familiar  series 


F ( 


e 


Z 


Actually  the  notation  looks  a bit  unsettling  when  m or  n is  zero.  We  can  add 
an  extra  ‘1’  above  and  below  in  order  to  avoid  this: 


F 


= e 


Z 


In  general  we  don’t  change  the  function  if  we  cancel  a parameter  that  occurs 
in  both  numerator  and  denominator,  or  if  we  insert  two  identical  parameters. 

The  next  simplest  case  has  m = 1,  Q]  = 1,  and  n = 0;  we  change  the 
parameters  to  m = 2,  ai  = aj  = 1,  n = 1,  and  bi  = 1,  so  that  n > 0.  This 
series  also  turns  out  to  be  familiar,  because  lk  = k!: 
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It’s  our  old  friend,  the  geometric  series;  F(  a’,  . . . , a,„;  b’  , . . . , bn;  z)  is  called 
hypergeometric  because  it  includes  the  geometric  series  F(  1,1;  1 ; z,)  as  a very 
special  case. 

The  general  case  m = 1 and  n = 0 is,  in  fact,  easy  to  sum  in  closed  form. 


a,  1 

1 


kSO 


- 7k 
k ±_ 

k! 


V (Q  + k_1V  1 

f \ k ) (l  -z)° 


(5-77) 


using  (5.56).  If  we  replace  a by  -a  and  z by  —z,  we  get  the  binomial  theorem. 


F 


0+z)a 


A negative  integer  as  upper  parameter  causes  the  infinite  series  to  become 
finite,  since  (-a)”  = 0 whenever  k > a ;>  0 and  a is  an  integer. 

The  general  case  m = 0,  n = 1 is  another  famous  series,  but  it’s  not  as 
well  known  in  the  literature  of  discrete  mathematics: 


F 


(b  — 1 )!  zk 
(b  — 1 +k)!  k! 


Ih  1 (2\/z) 


(b  — 1)! 

z(b  — 1)/2  ’ 


(5-78) 


This  function  It,  ’ is  called  a “modified  Bessel  function’’  of  order  b — 1.  The 
special  case  b = 1 gives  us  F(  ^ |z)=  I0(2y/z),  which  is  the  interesting  series 

The  special  case  m = n = 1 is  called  a “confluent  hypergeometric  series” 
and  often  denoted  by  the  letter  M: 


F 


M(a,b,z) 


(5-79) 


This  function,  which  has  important  applications  to  engineering,  was  intro- 
duced by  Ernst  Kummer. 

By  now  a few  of  us  are  wondering  why  we  haven’t  discussed  convergence 
of  the  infinite  series  (5.76).  The  answer  is  that  we  can  ignore  convergence  if 
we  are  using  z simply  as  a formal  symbol.  It  is  not  difficult  to  verify  that 
formal  infinite  sums  of  the  form  Y_  k > n ctkzk  form  a field,  if  the  coefficients 
(Xk  lie  in  a field.  We  can  add,  subtract,  multiply,  divide,  differentiate,  and  do 
functional  composition  on  such  formal  sums  without  worrying  about  conver- 
gence; any  identities  we  derive  will  still  be  formally  true.  For  example,  the 
hypergeometric  F(  1 ’ ’ 1 1 z)  = k!  zk  doesn’t  converge  for  any  nonzero 

yet  we’ll  see  in  Chapter  7 that  we  can  still  use  it  to  solve  problems.  On  the 
other  hand,  whenever  we  replace  z by  a particular  numerical  value,  we  do 
have  to  be  sure  that  the  infinite  sum  is  well  defined. 
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"There  must  be 
many  universities 
to-day  where  95 
per  cent,  if  not 
100  per  cent,  of  the 
functions  studied  by 
physics,  engineering, 
and  even  mathe- 
matics students, 
are  covered  by 
this  single  symbol 
F(a,b;c;x).” 

— W.  W.  Sawyer  [257] 


The  next  step  up  in  complication  is  actually  the  most  famous  hypergeo- 
metric of  all.  In  fact,  it  was  the  hypergeometric  series  until  about  1870,  when 
everything  was  generalized  to  arbitrary  m and  n.  This  one  has  two  upper 
parameters  and  one  lower  parameter: 


d,b 

c 


z = 


y akbk  zk 

ktc  ckk» 


(5-8o) 


It  is  often  called  the  Gaussian  hypergeometric,  because  many  of  its  subtle 
properties  were  first  proved  by  Gauss  in  his  doctoral  dissertation  of  1812  [116], 
although  Euler  [95]  and  Pfaff  [233]  had  already  discovered  some  remarkable 
things  about  it.  One  of  its  important  special  cases  is 


ln(1  + z)  = zF 


k!  ivV ' (-s)k 

( k T- 1 ) ! k M' 


Notice  that  z~ 1 ln(  1 +z)  is  a hypergeometric  function,  but  ln(  1 +z)  itself  cannot 
be  hypergeometric,  since  a hypergeometric  series  always  has  the  value  1 when 
z :=  0. 

So  far  hypergeometrics  haven’t  actually  done  anything  for  us  except  pro- 
vide an  excuse  for  name-dropping.  But  we’ve  seen  that  several  very  different 
functions  can  all  be  regarded  as  hypergeometric;  this  will  be  the  main  point  of 
interest  in  what  follows.  We’ll  see  that  a large  class  of  sums  can  be  written  as 
hypergeometric  series  in  a “canonical”  way,  hence  we  will  have  a good  filing 
system  for  facts  about  binomial  coefficients. 

What  series  are  hypergeometric?  It’s  easy  to  answer  this  question  if  we 
look  at  the  ratio  between  consecutive  terms: 


k .-I  X y k 
Q1  • ' Qin  z 

bk . . . bk  k! 


The  first  term  is  to  = 1 , and  the  other  terms  have  ratios  given  by 


tfc+i  _ a,k+l  . ..^.+1  bf...bk  k!  zk+' 

*k  ~ q\.  ..<F.  bF1  - -bFMk+l)!  Zk 

(k  + Qi)...(k  + am)z 
= (k  + bi)...(k  + bj(k+1)  ' 


This  is  a rational  function  of  k,  that  is,  a quotient  of  polynomials  in  k.  Any 
rational  function  of  k can  be  factored  over  the  complex  numbers  and  put 


208  BINOMIAL  COEFFICIENTS 


into  this  form.  The  a’s  are  the  negatives  of  the  roots  of  the  polynomial  in 
the  numerator,  and  the  b’s  are  the  negatives  of  the  roots  of  the  polynomial 
in  the  denominator.  If  the  denominator  doesn’t  already  contain  the  special 
factor  (k  + 1 ),  we  can  include  (k  + 1)  in  both  numerator  and  denominator.  A 
constant  factor  remains,  and  we  can  call  it  %,  Therefore  hypergeometric  series 
are  precisely  those  series  whose  first  term  is  1 and  whose  term  ratio  tk+i/tk 
is  a rational  function  of  k. 

Suppose,  for  example,  that  we’re  given  an  infinite  series  with  term  ratio 

tk+t  _ k2  + 7k  + 1 0 
tk  " 4k2 + 1 ’ 

a rational  function  of  k.  The  numerator  polynomial  splits  nicely  into  two 
factors,  (k  + 2)  (k  + 5),  and  the  denominator  is  4 (k  + i/2)  (k  i/2).  Since  the 
denominator  is  missing  the  required  factor  (k+  1 ),  we  write  the  term  ratio  as 


tku  _ (k  + 2)(k  + 5)(k  + 1)(1/4) 
tk  (k  + i/2)(k-i/2)(k  + 1) 

and  we  can  read  off  the  results:  The  given  series  is 


to  F 


( 2,5,1 

l i/2,  -i/2 


Thus,  we  have  a general  method  for  finding  the  hypergeometric  represen- 
tation of  a given  quantity  S,  when  such  a representation  is  possible:  First  we 
write  S as  an  infinite  series  whose  first  term  is  nonzero.  We  choose  a notation 
so  that  the  series  is  tk  with  to  7^  0.  Then  we  calculate  tk+i/tk.  If  the 

term  ratio  is  not  a rational  function  of  k,  we’re  out  of  luck.  Otherwise  we 
express  it  in  the  form  (5.81):  this  gives  parameters  al,  . . . , a„  bi , . . . , bn, 
and  an  argument  z,  such  that  S = to  F(  qi a,„;  bi , ■ • • , bn;  z). 

Gauss’s  hypergeometric  series  can  be  written  in  the  recursively  factored 

form 


F 


, a b 1 , 

= 1 + j-z  1 + 
1 c 


a+1  b+1  ( 

— tTTH'  + 


a+,2_  b + 2 
3 c~+2 


z ( 1 + 


if  we  wish  to  emphasize  the  importance  of  term  ratios. 

Let’s  try  now  to  reformulate  the  binomial  coefficient  identities  derived 
earlier  in  this  chapter,  expressing  them  as  hypergeometrics.  For  example, 
let’s  figure  out  what  the  parallel  summation  law, 


integern, 


(Now  is  a good 
time  to  do  warmup 

exercise  11.) 
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First 

now 


looks  like  in  hypergeometric  notation.  We  need  to  write  the  sum  as  an  infinite 
series  that  starts  at  k = 0,  so  we  replace  k by  n — k: 


r-  /r  + n - lc\  __  (r  + n - k) ! 

V n - k ) ~~  f—  r!  (n  k) ! 
k>0  v ' k^O  v 


=IV 


This  series  is  formally  infinite  but  actually  finite,  because  the  (n  k) ! in  the 
denominator  will  make  tk  = 0 when  k > n.  (We’ll  see  later  that  1/x!  is 
defined  for  all  x,  and  that  1/x!  =0  when  x is  a negative  integer.  But  for  now, 
let’s  blithely  disregard  such  technicalities  until  we  gain  more  hypergeometric 
experience.)  The  term  ratio  is 


tk+i  _ (r  + n-k-  lj!r!  (n-k)!  _ n-k 
tk  ~ r ! ( n - k - 1 ) ! ( r + n - k ) ! “ r + n-k 

(k  + 1)(k  — n)(1) 
(k-n-r)(k+  1) 


Furthermore  to  = (r^n) . Hence  the  parallel  summation  law  is  equivalent  to 
the  hypergeometric  identity 


r + n\  / 1 , -n 
n I \ — n— r 


r + n+  1 
n 


Dividing  through  by  (r^n)  g’lves  a slightly  simpler  version, 


F 


1,  -n 
— n— r 


r + n+  1 
r+1 


if 


r + n 
n 


/0. 


(5-82) 


Let’s  do  another  one.  The  term  ratio  of  identity  (5.16), 

(— 1)k  = (— 1)m^T  integer  m, 

is  (k-m)/(T-m  + k + 1)  = (k  + l)(k-m)(1)/(k-m  + r + 1)(k+  1),  after 
we  replace  k by  m — k;  hence  (5.16)  gives  a closed  form  for 


F 


( 1,  -m 
\— m+r+1 


This  is  essentially  the  same  as  the  hypergeometric  function  on  the  left  of 
(5.82),  but  with  TTL  in  place  of  n and  r + 1 in  place  of  — r.  Therefore  identity 
(5.16)  could  have  been  derived  from  (5.82),  the  hypergeometric  version  of 
(5.9).  (No  wonder  we  found  it  easy  to  prove  (5.16)  by  using  (5.9).) 
derangements,  Before  we  go  further,  we  should  think  about  degenerate  cases,  because 

degenerates.  hypergeometrics  are  not  defined  when  a lower  parameter  is  zero  or  a negative 
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integer.  We  usually  apply  the  parallel  summation  identity  when  r and  n are 
positive  integers;  but  then  — rt— r is  a negative  integer  and  the  hypergeometric 
(5.76)  is  undefined.  How  then  can  we  consider  (5.82)  to  be  legitimate?  The 
answer  is  that  we  can  take  the  limit  of  F(  1)  as  £ — > 0. 

We  will  look  at  such  things  more  closely  later  in  this  chapter,  but  for  now 
let’s  just  be  aware  that  some  denominators  can  be  dynamite.  It  is  interesting, 
however,  that  the  very  first  sum  we’ve  tried  to  express  hypergeometrically 
has  turned  out  to  be  degenerate. 

Another  possibly  sore  point  in  our  derivation  of  (5.82)  is  that  we  ex- 
panded (r^nkk)  as  (r  -f  n k)!/r!  (n  k)!.  This  expansion  fails  when  r is  a 
negative  integer,  because  (— m)!  has  to  be  oo  if  the  law 

O!  = 0 • ( — 1 ) • (—2)  • . . . • (— m + 1 ) • (— m)i 

is  going  to  hold.  Again,  we  need  to  approach  integer  results  by  considering  a 
limit  of  r 4-  e as  e — > 0. 

But  we  defined  the  factorial  representation  (£)  = r! /he!  (r-k) ! only  when 
r is  an  integer!  If  we  want  to  work  effectively  with  hypergeometrics,  we  need 
a factorial  function  that  is  defined  for  all  complex  numbers.  Fortunately  there 
is  such  a function,  and  it  can  be  defined  in  many  ways.  Here’s  one  of  the  most 
useful  definitions  of  z!,  actually  a definition  of  1 /z!  ; 


(We  proved  the 
identities  originally 
for  integer  r,  and 
used  the  polynomial 
argument  to  show 
that  they  hold  in 
general.  Now  we’re 
proving  them  first 
for  irrational  r , 
and  using  a limiting 
argument  to  show 
that  they  ho/d  for 
integers!) 


Z! 


(5-83) 


(See  exercise  21.  Euler  [81]  discovered  this  when  he  was  22  years  old.)  The 
limit  can  be  shown  to  exist  for  all  complex  z,  and  it  is  zero  only  when  z is  a 
negative  integer.  Another  significant  definition  is 


z!  = 


tze  1 dt 


if  91z  > -1. 


(5-84) 


This  integral  exists  only  when  the  real  part  of  z exceeds  -l,  but  we  can  use 
the  formula 


z ! = z ( z - 1 ) ! (5.85) 

to  extend  (5.84)  to  all  complex  z (except  negative  integers).  Still  another 
definition  comes  from  Stirling’s  interpolation  of  lnz!  in  (5.47).  All  of  these 
approaches  lead  to  the  same  generalized  factorial  function. 

There’s  a very  similar  function  called  the  Gamma  function,  which  re- 
lates to  ordinary  factorials  somewhat  as  rising  powers  relate  to  falling  powers. 
Standard  reference  books  often  use  factorials  and  Gamma  functions  simulta- 
neously, and  it’s  convenient  to  convert  between  them  if  necessary  using  the 
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following  formulas: 


r(z  + i)  = 
(■i)i  r(z)  = 


z!; 

71 

sin  7X2  ' 


(5-86) 

(5-87) 


How  do  you  write 
zto  the  w power, 
when  vv  is  the 
complex  conjugate 
of  w? 

zm 


We  can  use  these  generalized  factorials  to  define  generalized  factorial 
powers,  when  z and  w are  arbitrary  complex  numbers: 


z^ 


z! 

(z-w)!  ’ 
r(z  + w) 
r(z) 


(5-88) 


(589) 


The  only  proviso  is  that  we  must  use  appropriate  limiting  values  when  these 
formulas  give  00/00.  (The  formulas  never  give  O/O,  because  factorials  and 
Gamma-function  values  are  never  zero.)  A binomial  coefficient  can  be  written 


Z 

QW 


lim  lim  — — — : 

C—>z  w - w cu!  (C  — * W ) ! 


(5-90) 


I see,  the  lower 
index  arrives  at 
its  limit  first. 
That’s  why  (/,) 
is  zero  when  vv  is 
a negative  integer. 


when  z and  w are  any  complex  numbers  whatever. 

Armed  with  generalized  factorial  tools,  we  can  return  to  our  goal  of  re- 
ducing the  identities  derived  earlier  to  their  hypergeometric  essences.  The 
binomial  theorem  (5.13)  turns  out  to  be  neither  more  nor  less  than  (5.77), 
as  we  might  expect.  So  the  next  most  interesting  identity  to  try  is  Vander- 
monde’s convolution  (5.27): 


integer  n. 


The  kth  term  here  is 


tic 


T!  s! 

( r - k ) ! k ! ( s - n + k ) ! ( n - k ) ! ’ 


and  we  are  no  longer  too  shy  to  use  generalized  factorials  in  these  expres- 
sions. Whenever  t|<  contains  a factor  like  (a  + k)!,  with  a plus  sign  before 
the  k,  we  get  (a  + k+  l)!/(a  +k)!  = k+  a+  1 in  the  term  ratio  tk+i/tk, 
by  (5.85);  this  contributes  the  parameter  ‘a+  1’  to  the  corresponding  hyper- 
geometric-as  an  upper  parameter  if  ( a + k)!  was  in  the  numerator  of  , 
but  as  a lower  parameter  otherwise.  Similarly,  a factor  like  (a  — k) ! leads  to 
(a  k 1)!/(<X  k)!  = (— 1)/(k  a);  this  contributes  ‘-a’  to  the  opposite 

set  of  parameters  (reversing  the  roles  of  upper  and  lower),  and  negates  the 
hypergeometric  argument.  Factors  like  r! , which  are  independent  of  k,  go 
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into  to  but  disappear  from  the  term  ratio.  Using  such  tricks  we  can  predict 
without  further  calculation  that  the  term  ratio  of  (5.27)  is 


tk+i  _ k - r k -n 
tk  k+  1 k + S — TL  — (-  1 


times  (—1  )2  = 1,  and  Vandermonde’s  convolution  becomes 


— r,  — n 
s-n+1 


1 - 


r + s 
n 


(5-9i) 


We  can  use  this  equation  to  determine  F(  a,  b;  c;  z ) in  general,  when  z = 1 and 
when  b is  a negative  integer. 

Let’s  rewrite  (5.91)  in  a form  so  that  table  lookup  is  easy  when  a new 
sum  needs  to  be  evaluated.  The  result  turns  out  to  be 


F 


T(c  — a — b)  T(c)  integer  b <i  0 

He  - a)  r(c  -b)  ' or  9tc  > Kq  + 9tb. 


(5-92) 


Vandermonde’s  convolution  (5.27)  covers  only  the  case  that  one  of  the  upper 
parameters,  say  b,  is  a nonpositive  integer;  but  Gauss  proved  that  (5.92)  is 
valid  also  when  a,  b,  c are  complex  numbers  whose  real  parts  satisfy  > 
93a  + 93b.  In  other  cases,  the  infinite  series  F(  Q^b  1)  doesn’t  converge.  When 
b = — n,  the  identity  can  be  written  more  conveniently  with  factorial  powers 
instead  of  Gamma  functions: 


F 


(c  - ci)n  _ (a  — c)- 
cn  ( — c)2L 


integer  n ;>  0. 


(5.93) 


A few  weeks  ago,  we 
were  studying  what 
Gauss  had  done  in 
kindergarten. 

Now  we’re  studying 
stuff  beyond  his 
Ph.D.  thesis. 

Is  this  intimidating 
or  what? 


It  turns  out  that  all  five  of  the  identities  in  Table  169  are  special  cases  of 
Vandermonde’s  convolution;  formula  (5.93)  covers  them  all,  when  proper  at- 
tention is  paid  to  degenerate  situations. 

Notice  that  (5.82)  is  just  the  special  case  a = 1 of  (5-93)-  Therefore  we 
don’t  really  need  to  remember  (5.82);  and  we  don’t  really  need  the  identity 
(5.9)  that  led  us  to  (5.82),  even  though  Table  174  said  that  it  was  memo- 
rable. A computer  program  for  formula  manipulation,  faced  with  the  prob- 
lem of  evaluating  X!k<n  (Tkk)>  could  convert  the  sum  to  a hypergeometric  and 
plug  into  the  general  identity  for  Vandermonde’s  convolution. 

Problem  1 in  Section  5.2  asked  for  the  value  of 


This  problem  is  a natural  for  hypergeometrics,  and  after  a bit  of  practice  any 
hypergeometer  can  read  off  the  parameters  immediately  as  F(  1 , — m;  — n;  1). 
Hmmrn;  that  problem  was  yet  another  special  takeoff  on  Vandermonde! 
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The  sum  in  Problem  2 and  Problem  4 likewise  yields  F(  2,1  — n;  2 — m;  1). 
(We  need  to  replace  k by  k + 1 first.)  And  the  “menacing”  sum  in  Problem  6 
turns  out  to  be  just  F(n+  1,  — n;  2;  1).  Is  there  nothing  more  to  sum,  besides 
disguised  versions  of  Vandermonde’s  powerful  convolution? 

Well,  yes,  Problem  3 is  a bit  different.  It  deals  with  a special  case  of  the 
general  sum  (nkk)zk  considered  in  (5.74),  and  this  leads  to  a closed-form 
expression  for 


( 1 +2\n/l\,  -n 

V 1/2 


We  also  proved  something  new  in  (5.55),  when  we  looked  at  the  coeffi- 
cients of  (1  — z)T(  1 + z)r: 


F 


— c— 2n, 


n (2n)!  (c-D! 

n I ( c + n - 1 ) ! ’ 


integer  n 1>  0. 


Kummer  was  a 
summer. 


This  is  called  Kurnmer’s  formula  when  it’s  generalized  to  complex  numbers: 


F 


a,  b 

1 +b  — a 


(b/2)! 

b! 


(b  — a 


T/2 


(5.94) 


The  summer  of  ‘36.  (Ernst  Kummer  [187]  proved  this  in  1836.) 

It’s  interesting  to  compare  these  two  formulas.  Replacing  c by  1 — 2n  — a, 
we  find  that  the  results  are  consistent  if  and  only  if 


(-1 


lim 

b— ► -2n 


(b/2)! 

b! 


lim 

x— ► n 


x! 

/2x]T 


(5.95) 


when  n is  a positive  integer.  Suppose,  for  example,  that  n = 3;  then  we 
should  have  — 6!/3!  = limx_,  3 x!/ (2x) ! . We  know  that  (—3)!  and  (—6)!  are 
both  infinite;  but  we  might  choose  to  ignore  that  difficulty  and  to  imagine 
that  (—3)!  = (—  3 ) ( — 4)(—  5) ( — 6)!,  so  that  the  two  occurrences  of  (—6)!  will 
cancel.  Such  temptations  must,  however,  be  resisted,  because  they  lead  to 
the  wrong  answer!  The  limit  of  x!/(2x)!  as  x -)  -3  is  not  (-3)  (-4)  (-5)  but 
rather  —61/31=  (— 4)(— 5)(— 6),  according  to  (5.95). 

The  right  way  to  evaluate  the  limit  in  (5.95)  is  to  use  equation  (5.87), 
which  relates  negative-argument  factorials  to  positive-argument  Gamma  func- 
tions. If  we  replace  x by  -n  -)-  e and  let  e — ) 0,  two  applications  of  (5.87) 
give 


(-n-e)!  r(n+e)  _ sin(2n+2e)7t 
(— 2n  — 2e ) ! f(2n+  2e]  — sin(n  + e)n 
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Now  sin(  x + y ) = sin  x cos  y + cos  x sin  y ; so  this  ratio  of  sines  is 
cos  2rt7i  sin  2e7i 


( — 1 )n(2  + 0(e))  , 


cos  n.7t  sin  671 

by  the  methods  of  Chapter  9.  Therefore,  by  (5.86),  we  have 
f-n-e)! 


lim 

e—O  (-2n  — 2e)! 


= 2( -1 : 


■r(2n)  = 2(-ir(2n_1)! 


T(n) 


(tv—  11 


= (-I' 


,(2n)! 

n! 


as  desired. 

Let’s  complete  our  survey  by  restating  the  other  identities  we’ve  seen  so 
far  in  this  chapter,  clothing  them  in  hypergeometric  garb.  The  triple-binomial 
sum  in  (5.29)  can  be  written 


1 — a — 2n,  1 -b  — 2n,  -2n 


a,  b 


(-1' 


(2n)!  (a  + b + 2n  — 2)r 


n! 


anbT1 


integer  n ^ 0. 


When  this  one  is  generalized  to  complex  numbers,  it  is  called  Dixon's  for- 
mula: 


( a,b, 
\ 1 fc-a,  1 


J 1 

+ C-D  , 


(c/2)!  (c  - a)— (c  — b)— 
c!  (c  — a — b)—  ’ 

SKa  + <Hb<l  + SRc/2. 


(5-96) 


One  of  the  most  general  formulas  we’ve  encountered  is  the  triple-binomial 

sum  (5.28),  which  yields  Saalschiitz's  identity: 


/ a,  b,  -ri 
\c,  a+b-c-n+1 


’) 


(c-  a)n  (c-b)n 
cK(c  — a — b)7*" 

(a  — c)-  (b  — c)- 
(— c)^-  (a  + b — c)n  ’ 


(5-97) 


integer  n ^ 0. 


This  formula  gives  the  value  at  z = 1 of  the  general  hypergeometric  series 
with  three  upper  parameters  and  two  lower  parameters,  provided  that  one 
of  the  upper  parameters  is  a nonpositive  integer  and  that  b;  + \)2  = Qi  + 
Q2  + Q3  + 1 . (If  the  sum  of  the  lower  parameters  exceeds  the  sum  of  the 
upper  parameters  by  2 instead  of  by  1 , the  formula  of  exercise  25  can  be  used 
to  express  F(a-| , d2,  as;  bi , b2j  1 ) in  terms  of  two  hypergeometrics  that  satisfy 
Saalschiitz’s  identity.) 

Our  hard-won  identity  in  Problem  8 of  Section  5.2  reduces  to 

X+l , rt+ 1 , -n  ^ 

1 , x+2 


( — 1 ) nx—  x— 1 
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Sigh.  This  is  just  the  special  case  c = 1 of  Saalschiitz’s  identity  (5-97),  so  we 
could  have  saved  a lot  of  work  by  going  to  hypergeometrics  directly! 

What  about  Problem  7?  That  extra-menacing  sum  gives  us  the  formula 

p /n+]-  m“n’  J A = m 

\ 5TTL+1  , TTL -p  , 2 / n 

which  is  the  first  case  we’ve  seen  with  three  lower  parameters.  So  it  looks 
new.  But  it  really  isn’t;  the  left-hand  side  can  be  replaced  by 


F 


, m - n - 1 , 


1 

2 


1 

2 


■1, 


(Historical  note: 

The  great  relevance 
of  hypergeometric 
series  to  binomial 
coefficient  identities 
was  first  pointed 
out  by  George 
Andrews  in  1974 
[9,  section  5].) 


using  exercise  26,  and  Saalschiitz’s  identity  wins  again. 

Well,  that’s  another  deflating  experience,  but  it’s  also  another  reason  to 
appreciate  the  power  of  hypergeometric  methods. 

The  convolution  identities  in  Table  202  do  not  have  hypergeometric 
equivalents,  because  their  term  ratios  are  rational  functions  of  k only  when 
t is  an  integer.  Equations  (5.64)  and  (5.65)  aren’t  hypergeometric  even  when 
t = 1.  But  we  can  take  note  of  what  (5.62)  tells  us  when  t has  small  integer 
values: 


/ 2r>  ~n,  -n-s  \ 

Vr+l,  — n— js,  — n— jS  + 5 ) 


F 


( 5r-  3r+5-  3r+3-  “n-  -n-js,  -n— Is  — 2 

V^r+2,  2r+l,  — u— ls,  -n-2s  + l,  -n-lS  + § 

■ c+s:3n) 


/ 


s + 3n 
n 


The  first  of  these  formulas  gives  the  result  of  Problem  7 again,  when  the 
quantities  (r,  s,n)  are  replaced  respectively  by  (1,2n  + 1 — m,  -1  — n). 

Finally,  the  “unexpected”  sum  (5.20)  gives  us  an  unexpected  hypergeo- 
metric identity  that  turns  out  to  be  quite  instructive.  Let’s  look  at  it  in  slow 
motion.  First  we  convert  to  an  infinite  sum. 


£|m.+  kVk  = .2  " 

k$m 


L 

kSO 


2m  - k 

m — k 


2k  = 2 


2m 


The  term  ratio  from  (2m  — k)!  2k/m!  (m  — k) ! is  2(k  — m)/(k  2m),  so  we 

have  a hypergeometric  identity  with  z = 2: 


2m' 

m 


1,  — m 
-2m 


2 = 2 


2m 


integer  m ^ 0. 


(5-98) 
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But  look  at  the  lower  parameter  ‘—2m’.  Negative  integers  are  verboten,  so 
this  identity  is  undefined! 

It’s  high  time  to  look  at  such  limiting  cases  carefully,  as  promised  earlier, 
because  degenerate  hypergeometrics  can  often  be  evaluated  by  approaching 
them  from  nearby  nondegenerate  points.  We  must  be  careful  when  we  do  this, 
because  different  results  can  be  obtained  if  we  take  limits  in  different  ways. 
For  example,  here  are  two  limits  that  turn  out  to  be  quite  different  when  one 
of  the  upper  parameters  is  increased  by  £; 


lim  F 

e— >0 


/— 1 +e,  —3 
V -2+e 


:=  limfl  + 

e— >0  ' 


(-1  + e)(— 3)  + (-l  + e)(e)(-3)(-2) 

( -2  + e)  1 ! (-2  + e)(-l+e)  2! 

+ ( — l + e)(e)(l+e)(  -3|(-2)(-l|\ 
(-2+e)(-1  + e)(e)3!  ) 


--=  1 - § + o + \ = 0 ; 


lim  F 

e— >0 


+ f-2+e)3!1!  + 0 + °) 

1 - f +0  + 0 = 


Similarly,  we  have  defined  (_])  = 0 = lime_,o  ( |je)  ; this  is  not  the  same 
as  lime_*o  = T The  proper  way  to  treat  (5.98)  as  a limit  is  to  realize 

that  the  upper  parameter  -m  is  being  used  to  make  all  terms  of  the  series 
£k>0  (2™rkk)2k  zero  for  k > m;  this  means  that  we  want  to  make  the  following 
more  precise  statement: 


lim  F 

e— *0 


/ 1,  -m 
\-2m+e 


22m  , integer  m ^ 0, 


(5.99) 


Each  term  of  this  limit  is  well  defined,  because  the  denominator  factor  (— 2m)k 
does  not  become  zero  until  k.  > 2m.  Therefore  this  limit  gives  us  exactly  the 
sum  (5.20)  we  began  with. 
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It  should  be  clear  by  now  that  a database  of  known  hypergeometric 
closed  forms  is  a useful  tool  for  doing  sums  of  binomial  coefficients.  We 
simply  convert  any  given  sum  into  its  canonical  hypergeometric  form,  then 
look  it  up  in  the  table.  If  it’s  there,  fine,  we’ve  got  the  answer.  If  not,  we  can 
add  it  to  the  database  if  the  sum  turns  out  to  be  expressible  in  closed  form. 
We  might  also  include  entries  in  the  table  that  say,  “This  sum  does  not  have  a 
simple  closed  form  in  general.”  For  example,  the  sum  £2k<m(k)  corresponds 
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to  the  hypergeometric 


/ 1,  -m 
Vn-m+1 


integers  n ^ m ^ 0; 


(5.100) 


The  hypergeo- 

metric  database 
should  really  be  a 
"knowledge  base," 


this  has  a simple  closed  form  only  if  m is  near  0,  |n,  or  n. 

But  there’s  more  to  the  story,  since  hypergeometric  functions  also  obey 
identities  of  their  own.  This  means  that  every  closed  form  for  hypergeometrics 
leads  to  additional  closed  forms  and  to  additional  entries  in  the  database.  For 
example,  the  identities  in  exercises  25  and  26  tell  us  how  to  transform  one 
hypergeometric  into  two  others  with  similar  but  different  parameters.  These 
can  in  turn  be  transformed  again. 

In  1793,  J.  F.  Pfaff  discovered  a surprising  reflection  law, 


_J 

(1  -z)Q 


F 


F 


Tbh) 


(5-101) 


which  is  a transformation  of  another  type.  This  is  a formal  identity  in 
power  series,  if  the  quantity  (—  z)k/(  1 — z)k+Q  is  replaced  by  the  infinite  series 
(— z)k(l  + (k|Q)z  + (k+2  + 1)  z2  + ■ • ■)  when  the  left-hand  side  is  expanded  (see 
exercise  50).  We  can  use  this  law  to  derive  new  formulas  from  the  identities 
we  already  know,  when  z / 1 . 

For  example,  Kummer’s  formula  (5.94)  can  be  combined  with  the  reflec- 
tion law  (5.101)  if  we  choose  the  parameters  so  that  both  identities  apply: 


2-qf 


/a,  1-a  IN 
\ 1 -p  b a 2J 


a,  b 

,1+b-a 

(b/2) ! 


b! 


-(b 


lb/2 


(5.102) 


We  can  now  set  a = -n  and  go  back  from  this  equation  to  a new  identity  in 
binomial  coefficients  that  we  might  need  some  day: 


y (~n)k  (l+n)k  2~k 

h>  (1+b+n)k  k! 


(b/2)!  (b+n)! 
b ! (b/2+n)!  ’ 


integer  n ^ 0. 


(5.103) 


For  example,  when  n = 3 this  identity  says  that 


4 , 4.5  4-5-6 

2(43  + b)  + j 4(4  + b)  (5  + b)  8(4  + b)(5  + b)(6  + b) 

(b  + 3)(b  + 2)(b  + 1) 

= (b  + 6)(b  + 4) (b  + 2) 
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It’s  almost  unbelievable,  but  true,  for  all  b.  (Except  when  a factor  in  the 
denominator  vanishes.) 

This  is  fun;  let’s  try  again.  Maybe  we’ll  find  a formula  that  will  really 
astonish  our  friends.  What  does  Pfaff’s  reflection  law  tell  us  if  we  apply  it  to 
the  strange  form  (5.99),  where  z = 2?  In  this  case  we  set  a = — rri,  b = 1, 
and  c = — 2tn+  £,  obtaining 


-1  )m  lim  F 


— m,  1 
— 2m+e 


= lim  F 

e—>0 


- lim  y 

£ — >0  ' 
100 


— m,  — 2m— 1 +e 
— 2m+e 

(— m)k  (—  2m—  1 + e)k  2k 


-2m 


k! 


y (m)  (2l^  + l)~(-2)k 

*-\k)  (2m)k  1 j ; 


k^m 


because  none  of  the  limiting  terms  is  close  to  zero.  This  leads  to  another 
miraculous  formula, 


y (m)  2m+1  (_2)k  = 

Ik/  2m+1  — k J 

k$m  ' 7 


:-Dm2- 


m.T2m 


2m 

m 


-v 

When  m = 3,  for  example,  the  sum  is 

16 


-1/2 

m 


integer  m 0.  (5.104) 


14  = — 


5 ’ 


and  ( 3 2)  is  indeed  equal  to 

When  we  looked  at  our  binomial  coefficient  identities  and  converted  them 
to  hypergeometric  form,  we  overlooked  (5.19)  because  it  was  a relation  be- 
tween two  sums  instead  of  a closed  form.  But  now  we  can  regard  (5.19)  as 
an  identity  between  hypergeometric  series.  If  we  differentiate  it  n times  with 
respect  to  y and  then  replace  k by  m — n — k,  we  get 


= 1: 


+ r 

- k 


n + k' 
n 


k$0 


— r 

m - n - k 


xtn-n-kYk 


n + k 
n 


\m-n-k 


(x  + y)k. 


This  yields  the  following  hypergeometric  transformation: 


a,  -n 

c 


z = 


-C 


a,  -n 
-n+a-c 


1 — Z 


integer 
n ^ 0 


(5-105) 
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Notice  that  when  z = 1 this  reduces  to  Vandermonde’s  convolution,  (5.93). 

Differentiation  seems  to  be  useful,  if  this  example  is  any  indication;  we 
also  found  it  helpful  in  Chapter  2,  when  summing  x + 2x2  + nxn.  Let’s 

see  what  happens  when  a general  hypergeometric  series  is  differentiated  with 
respect  to  z: 


H ow  do  you  pro- 
nounce 4 ? 

(Dunno,  but  Tj$< 
calls  it  ‘vartheta’.) 


L 


at  zk~' 


k>,  bk . . . bk  (k  — 1 )! 


L 


,k+1 


. . . Q 


k+1  rk 


k+1^1  ul 


b*+1  . . . bn+1  k! 


y Q1  (ai+1  )k  • ■ ■ flm(Om  + 1)k^ 

feb,  (b1+l)^...J?(bn+l)kk! 

ai  ■ ■ ■ a,  / ai  +1 , ■ ■ ■ , am-t-1  \ 

b]...bn  \ bi+1 , . . . , bn+1  J 


(5.106) 


The  parameters  move  out  and  shift  up. 

ft’s  also  possible  to  use  differentiation  to  tweak  just  one  of  the  parameters 
while  holding  the  rest  of  them  fixed.  For  this  we  use  the  operator 


which  acts  on  a function  by  differentiating  it  and  then  multiplying  by  z.  This 
operator  gives 


DF 


Cl  1 » • • • ) On 
bi , . . . , b„ 


= zy  ^■••Qmzk  1 = y kQj---Qm  *k 

^b*...bk(k-1)<  bf  . . . bjj  k! 


which  by  itself  isn’t  too  useful.  But  if  we  multiply  F by  one  of  its  upper 
parameters,  say  al,  and  add  f)F,  we  get 


(6  + Qi)F 


Q-1  > • • • » 
bi,...  > bn 


y (k+gQaif  ...c4  zk 

ife  bk...bkk! 

ai-Fl  )kaf . . . ok  zk 


z = 


- L 

k^C 

= Qi  F 


bf...b£k! 

Qi+1,  a2, ....  a, 
bi bn 


Only  one  parameter  has  been  shifted. 
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A similar  trick  works  with  lower  parameters,  but  in  this  case  things  shift 
down  instead  of  up: 


(9  + b,  — 1 ) F 


til  i • • • i Hm  \ 

bi,...,bn  ) 


L 

V.J:0 


(k  + bi  - 1 )ak  . . . zk 
bk...bkk! 


y (bi  -1)  a1^.  . a^zk 

kb  (bi  — l)kbf . . . bk  k! 


= (bi— 1)  F 


^1  > • • ♦ » 

bi-l,  b2 bn 


We  can  now  combine  all  these  operations  and  make  a mathematical  “pun” 
by  expressing  the  same  quantity  in  two  different  ways.  Namely,  we  have 


(9  + Qi) . . . (9  + am)F 


- ai  . . . am  F 


Ql  + 1i  . ■ ■ ■ ctm+1  \ 
bi , . . . . bn  7 


Ever  hear  the  one 
about  the  brothers 
who  named  their 
cattle  ranch  Focus, 
because  it’s  where 
the  sons  raise  meat? 


and 


(9  + bi  - 1).  . . (4  + bn-  l)F 

- (bi -i ) . . . (bn-i ) f (bi  ; ; 

where  F = F(qi,  ■ ■ ■ , a,;  bi , • • • , bn;z).  And  (5.106)  tells  us  that  the  top  line 
is  the  derivative  of  the  bottom  line.  Therefore  the  general  hypergeometric 
function  F satisfies  the  differential  equation 


D(9  +bi  - 1).. . (9  + bn  - l)F  = (4  + al).  . . (9  + am)F,  (5.107) 

where  D is  the  operator 

This  cries  out  for  an  example.  Let’s  find  the  differential  equation  satisfied 
by  the  standard  a-over-1  hypergeometric  series  F(z)=  F(a,b;  c;  z).  According 
to  (5.107),  we  have 

D(9  + c-l)F  = (9  + a) (9  + b)F  . 

What  does  this  mean  in  ordinary  notation  ? Well,  (4  + c - l)F  is  zF'(z)  + 
(c  — 1 )F(z),and  the  derivative  of  this  gives  the  left-hand  side. 


F’(z)  + zF"(z)  + (c  - l)F'(z)  ■ 
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The  function 
F(z)  = (l  - z)r 

satisfies 

AF  = z(d  — r)F. 
This  jives  another 
proof  of  the  bino- 
mial theorem. 


On  the  right-hand  side  we  have 

(A  + a)(zF'(z)-l-bF(z))  = z^  (zF'(z)  -FbF(z))  + a(zF'(z) + bF(z)) 

= zF,(z)  + z2F"(z)  + bzF,(z)  + azF'(z)  + abF(z) . 

Equating  the  two  sides  tells  us  that 

z(1  - z)F"(z)  + (c  - z(a  + b + 1 ))F'(z)  - abF(z)  = 0.  (5.108) 

This  equation  is  equivalent  to  the  factored  form  (5.107). 

Conversely,  we  can  go  back  from  the  differential  equation  to  the  power 
series.  Let’s  assume  that  F(z)  = ^^>0  tkzk  is  a power  series  satisfying  (5.107). 
A straightforward  calculation  shows  that  we  must  have 

tk+i  (k  + Qi ) . . . (k -F  am) 

tfc  = (k  + bi)...(k+bn)(k+1)  ’ 

hence  F(z)  must  be  to  F(qi  a,„;  bi bn;  z).  We’ve  proved  that  the 
hypergeometric  series  (5.76)  is  the  only  formal  power  series  that  satisfies  the 
differential  equation  (5.107)  and  has  the  constant  term  1. 

It  would  be  nice  if  hypergeometrics  solved  all  the  world’s  differential 
equations,  but  they  don’t  quite.  The  right-hand  side  of  (5.107)  always  expands 
into  a sum  of  terms  of  the  form  akzkF|k'  (z),  where  F^kl(z)  is  the  kth  derivative 
DkF(k);  the  left-hand  side  always  expands  into  a sum  of  terms  of  the  form 
|3kzk  ’F'^fz)  with  k > 0.  So  the  differential  equation  (5.107)  always  takes 
the  special  form 

zn_1  (Pn  — zan)F|nl(z)  + . . . + (Pi  - zai)F'(z)  a<jF(z)  = 0. 

Equation  (5.108)  illustrates  this  in  the  case  n = 2.  Conversely,  we  will  prove 
in  exercise  6.13  that  any  differential  equation  of  this  form  can  be  factored  in 
terms  of  the  4 operator,  to  give  an  equation  like  (5.107).  So  these  are  the  dif- 
ferential equations  whose  solutions  are  power  series  with  rational  term  ratios. 

Multiplying  both  sides  of  (5.107)  by  z dispenses  with  the  D operator  and 
gives  us  an  instructive  all-4  form, 

A(A  + bi  1).  . . (4  + bn  i) F = z(!  + ai).  . (A  + am)F.  (5-109) 

The  first  factor  4 = (A+  1 — 1)  on  the  left  corresponds  to  the  (k-H  1)  in  the  term 
ratio  (5.81),  which  corresponds  to  the  k!  in  the  denominator  of  the  kth  term 
in  a general  hypergeometric  series.  The  other  factors  (4  + bj  1)  correspond 
to  the  denominator  factor  (k+  bj),  which  corresponds  to  bk  in  (5.76).  On  the 
right,  the  z corresponds  to  zk,  and  (4  + Qj  ) corresponds  to  ak. 
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One  use  of  this  differential  theory  is  to  find  and  prove  new  transforma- 
tions. For  example,  we  can  readily  verify  that  both  of  the  hypergeometrics 

2a,  2b 


and  F 


a,  b 

a + b+  j 


4z(1  — z 


a+b  + j 
satisfy  the  differential  equation 

2(1  -z)F"(z)  + (a  + b + 1)(1  — 2z)F'(z)  -4abF(z)  = 0; 
hence  Gauss’s  identity  [116,  equation  102] 


2a,  2b 
a + b + \ 


= F 


a,  b 
a+b-H 


must  be  true.  In  particular, 


2a,  2b 
a+b-F  -i 


a,  b 

a+b+  i 


4z(l  —z) 


(5-no) 


(5-m) 


whenever  both  infinite  sums  converge. 

Every  new  identity  for  hypergeometrics  has  consequences  for  binomial 
coefficients,  and  this  one  is  no  exception.  Let’s  consider  the  sum 


L 

k^m 


m — k 
n 


m + n + 1 
k 


) , integers  m ^ n ^ 0. 


The  terms  are  nonzero  for  0 <C  k ^ m — n,  and  with  a little  delicate  limit- 
taking as  before  we  can  express  this  sum  as  the  hypergeometric 


TTl 

lim 

e-*0  0 n 


F 


n-  m, 


-n  — m— 1 +ae 
-m-F  6 


The  value  of  a doesn’t  affect  the  limit,  since  the  nonpositive  upper  parameter 
n — m cuts  the  sum  off  early.  We  can  set  oc  = 2,  so  that  (5.111)  applies. 
The  limit  can  now  be  evaluated  because  the  right-hand  side  is  a special  case 
of  (5.92).  The  result  can  be  expressed  in  simplified  form, 


L 

k^m 


m — k 
n 


m + n + 1 
k 


m + n)/2 


n 


[m  + n is  even], 


integers 
m ^ n ^ 0, 


(5-112) 


as  shown  in  exercise  54.  For  example,  when  m = 5 and  n = 2 we  get 
(2M0)  “ (z)(?)/2  + (2K2)/4  ■■  (2K3)/8  =10-24  + 21-7  =0; whenm=- 


( Caution:  We  can’t 
use  (5.110)  safely 
when  |z|  >1/2, 
unless  both  sides 
are  polynomials; 
see  exercise  53.) 
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We  can  also  find  cases  where  (5.1 10)  gives  binomial  sums  when  z = -1, 
but  these  are  really  weird.  If  we  set  a = 2 — j and  b = — rt,  we  get  the 
monstrous  formula 


-fn,  —2  n 

2_  in 
3 3 11 


1 I = F 


Z~  3n-  -n 

2_in 
3 3 1 


— 8 


These  hypergeometrics  are  nondegenerate  polynomials  when  n ^ 2 (mod  3); 
and  the  parameters  have  been  cleverly  chosen  so  that  the  left-hand  side  can 
be  evaluated  by  (5.94).  We  are  therefore  led  to  a truly  mind-boggling  result, 


L 


integer  n ^ 0,  Tl  ^ 2 (mod  3). 


(5-113) 


The  only  use  of 
(5,113)  isto demon- 
strate the  existence 

of  incredibly  useless 
identities, 


This  is  the  most  startling  identity  in  binomial  coefficients  that  we’ve  seen. 
Small  cases  of  the  identity  aren’t  even  easy  to  check  by  hand.  (It  turns  out 
that  both  sides  do  give  y-  when  n = 3.)  But  the  identity  is  completely  useless, 
of  course;  surely  it  will  never  arise  in  a practical  problem. 

So  that’s  our  hype  for  hypergeometrics.  We’ve  seen  that  hypergeometric 
series  provide  a high-level  way  to  understand  what’s  going  on  in  binomial 
coefficient  sums.  A great  deal  of  additional  information  can  be  found  in  the 
classic  book  by  Wilfred  N.  Bailey  [15]  and  its  sequel  by  Lucy  Joan  Slater  [269]. 


5.7  PARTIAL  HYPERGEOMETRIC  SUMS 

Most  of  the  sums  we’ve  evaluated  in  this  chapter  range  over  all  in- 
dices k ^ 0,  but  sometimes  we’ve  been  able  to  find  a closed  form  that  works 
over  a general  range  0 ;C  k < m.  For  example,  we  know  from  (5.16)  that 

(vV_1)k  = (~1)m“1  )*  integer  m.  (5.114) 

The  theory  in  Chapter  2 gives  us  a nice  way  to  understand  formulas  like  this: 
If  f(k)  = Ag(k)  = g(k  + 1)  — g(k),  then  we’ve  agreed  to  write  f(k)  6k  = 
g(k)  + C,  and 

^af(k)6k  = g(k)  I*  = g(b)  - g(a). 

Furthermore,  when  a and  b are  integers  with  a <f  b,  we  have 

^f(k)6k  = Y_  f(k)  = g(b)-g(a). 

a^kcb 
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Therefore  identity  (5.114)  corresponds  to  the  indefinite  summation  formula 


and  to  the  difference  formula 

a(M’k(")) = H’k+'  (ktl)' 

It’s  easy  to  start  with  a function  g(k)  and  to  compute  Ag(k)  = f(k),  a 
function  whose  sum  will  be  g(k)  + C.  But  it’s  much  harder  to  start  with  f(k) 
and  to  figure  out  its  indefinite  sum  £ f(k)  6k  = g(k)  + C;  this  function  g 
might  not  have  a simple  form.  For  example,  there  is  apparently  no  simple 
form  for  ]JT  (£)  6k;  otherwise  we  could  evaluate  sums  like  ^k<n/3  (£) , about 
which  we’re  clueless. 

In  1977,  R.  W.  Gosper  [124]  discovered  a beautiful  way  to  decide  whether 
a given  function  is  indefinitely  summable  with  respect  to  a general  class  of 
functions  called  hypergeometric  terms.  Let  us  write 


F 


a,k.  . . &,  zk 
bk..  . bkkT 


(5-115) 


for  the  kth  term  of  the  hypergeometric  series  F(  al,.  . . , a,„;  bi  , • • • , bn;  z).  We 
will  regard  F(  qj  , . . . , a,;  bi , . . . , bn;  z)k  as  a function  of  k,  not  of  z.  Gosper’s 
decision  procedure  allows  us  to  decide  if  there  exist  parameters  c,  Ai,  ■ • • , Ajw , 
Bi,.  Bn,  and  Z such  that 


i+c- 


(5-116) 


given  al,  . . . , a„  bi , . . . , b„  and  z.  We  will  say  that  a given  function 
F(qi , . . . , am;bi , . . . , bn;z)kis  summable  in  hypergeometric  terms  if  such 
constants  c,  Al,  . . . , Am,  Bi  , . . . , Bn,  Z exist. 

Let’s  write  t(k)  and  T(k)  as  abbreviations  for  F(qi  , . . . , a,„;  bi , . . . , bn;  z)k 
and  F(Ai , . . . , Am!  Bi  , . . . , Bn;  Z)k>  respectively.  The  first  step  in  Gosper’s 
decision  procedure  is  to  express  the  term  ratio 

t(k  + 1)  _ (k  + Qi)...(k+Qm)z 
t(k)  “ (k  + bi)...(k  + bn)(k+l) 


in  the  special  form 


t(k+  1)  _ p(k+  1)  q(k) 
t(k)  ~ p(k)  r(k  + 1 ) ’ 


(5-117) 
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(Di  vi  si  bi  I ity  of  poly- 
nomials is  analogous 
to  divisibility  of 
integers.  For  exam- 
ple, (k  + a)\q(k) 
means  that  the  quo- 
tient q(k)/(k  + a) 
is  i polynomial. 

It’s  well  known  that 
(k  + a)\q(k) 
if  and  only  if 

q(— «)  = 0.) 


(Exercise  55  ex- 
plains why  we  might 
want  to  make  this 
magic  substitution.) 


where  p,  q,  and  r are  polynomials  subject  to  the  following  condition: 

(k  + a)\q(k)  and  (k  + |3)\r(k] 

=>  a (3  is  not  a positive  integer.  (5.118) 

This  condition  is  easy  to  achieve:  We  start  by  provisionally  setting  p ( k ) =1, 
q(k)  = (k+  d) . . . (k+  am)z,  and  r(k)  = (k  + bi  - 1 ) . . . (k  + bn  - 1 )k;  then 
we  check  if  (5.118)  is  violated.  If  q and  r have  factors  (k  + a)  and  (k  + |3) 
where  a — (3  — N > 0,  we  divide  them  out  of  q and  r and  replace  p ( k ) by 

p(k)(k+<x-1)^l  = p(k)(k+a-1)(k  + cx-2)...(k+|3  + 1). 

The  new  p,  q,  and  r still  satisfy  (5.117),  and  we  can  repeat  this  process  until 
(5.118)  holds. 

Our  goal  is  to  find  a hypergeometric  term  T(k)  such  that 


t(k)  = cT(k+  1)  — cT(k) 


for  some  constant  c.  Let’s  write 


cT(k) 


r(k)  s(k)  t(k) 

plk)  ’ 


(5-119) 


(5.120) 


where  s(k)  is  a secret  function  that  must  be  discovered  somehow.  Plugging 
(5.120)  into  (5.117)  and  (5.119)  gives  us  the  equation  that  s(k)  must  satisfy: 

p(k)  = q(k)s(k+  1)  — r(k)  s(k)  (5-i2i) 

If  we  can  find  s(k)  satisfying  this  recurrence,  we’ve  found  Y_  t(k)  6k, 

We’re  assuming  that  T(k+  1 )/T(k)  is  a rational  function  of  k.  Therefore, 
by  (5.120)  and  (5.119).  r(k)s(k)/p(k)  = T(k)/(T(k  + 1)  -T(k))  is  a rational 
function  of  k,  and  s(k)  itself  must  be  a quotient  of  polynomials: 

s(k)  = f(k)/g(k).  (5.122) 

But  in  fact  we  can  prove  that  s(k)  is  itself  a polynomial.  For  if  g(k)  7^  1 , 
and  if  f(k)  and  g(k)  have  no  common  factors,  let  N be  the  largest  integer 
such  that  (k  -f  (3 ) and  (k  + (3  + N — 1 ) both  occur  as  factors  of  g(k)  for  some 
complex  number  (3.  The  value  of  N is  positive,  since  N = 1 always  satisfies 
this  condition.  Equation  (5.121)  can  be  rewritten 

p(k)g(k+1)g(k)  = q(k)f(k+l  )g(k)  -r(k)g(k+1)f(k) , 

and  if  we  set  k = — (3  and  k = — (3  N we  get 


r(-|3)g(1-|3)f(-|3)  - 0 = q(-|3-N)f(1-|3-N)g(-|3-N) 
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Now  f(— (3)  7^  0 and  f(1  — |3  -N)  / 0,  because  f and  g have  no  common 
roots.  Also  g{l  — |3)  ^ 0 and  g(  — 13  — N)  ^ 0,  because  g(k)  would  otherwise 
contain  the  factor  (k+  (3  — 1)  or  (k+  (3  +N),  contrary  to  the  maximality  of  N. 
Therefore 

r(-(3)  = q ( — (3  - N)  = o. 

But  this  contradicts  condition  (5.118).  Hence  s(k)  must  be  a polynomial. 

The  remaining  task  is  to  decide  whether  there  exists  a polynomial  s(k) 
satisfying  (5.121),  when  p(k),  q(k),  and  r(k)  are  given  polynomials.  It’s  easy 
to  decide  this  for  polynomials  of  any  particular  degree  d,  since  we  can  write 

s(k)  = <xdkd  + ctd-i  kd_1  + ■ • . + ao  » otd  / 0 

for  unknown  coefficients  (ad, . . . , cxo)  and  plug  this  expression  into  the  defin- 
ing equation.  The  polynomial  s(k)  will  satisfy  the  recurrence  if  and  only  if 
the  a’s  satisfy  certain  linear  equations,  because  each  power  of  k must  have 
the  same  coefficient  on  both  sides  of  (5.121). 

But  how  can  we  determine  the  degree  of  s?  It  turns  out  that  there 
actually  are  at  most  two  possibilities.  We  can  rewrite  (5.121)  in  the  form 

2p(k)  = Q(k)(s(k+  l)  + s(!c))  + R(k)(s(k+l)  - s ( k ) ) , (5-123) 

where  Q(k)  = q(k)  -r(k)  and  R(k)  = q(k)  +r(k). 

If  s(k)  has  degree  d,  then  the  sum  s(k  + 1 ) + s(k)  = 2ctdkd  + . . . also  has 

degree  d,  while  the  difference  s(k  T 1)  — s(k)  = As(k)  = dadkd_1  + . . . has 

degree  d — 1 . (The  zero  polynomial  can  be  assumed  to  have  degree  - 1.  ) Let’s 
write  deg(p)  for  the  degree  of  a polynomial  p.  If  deg(Q)  ^ deg(R),  then 
the  degree  of  the  right-hand  side  of  (5.128)  is  deg(Q)  -f  d,  so  we  must  have 
d = deg(p)  deg(Q).  On  the  other  hand  if  deg(Q)  < deg(R)  = d’,  we  can 
write  Q(k)  = |3kd  +•  . . and  R(k)  = ykd  +•  • ■ where  y ^ 0;  the  right-hand 
side  of  (5.123)  has  the  form 

(2(3ad  + ydad)kd+d  + . . . . 

Ergo,  two  possibilities:  Either  2(3  + yd  ^ 0,  and  d = deg(p)  deg(R)  + 1| 
or  2(3  + yd  = 0,  and  d > deg(p)  — deg(R)  -f  1,  The  second  case  needs  to  be 
examined  only  if  — 2|3/y  is  an  integer  d greater  than  deg(p)  — deg(R)  + 1. 

Thus  we  have  enough  facts  to  decide  if  a suitable  polynomial  s(k)  exists. 
If  so,  we  can  plug  it  into  (5.120)  and  we  have  our  T.  If  not,  we’ve  proved  that 
t(k)  6k  is  not  a hypergeometric  term. 


5.7  PARTIAL  H Y P E R G E 0 M E T R I C SUMS  227 


Why  isn’t  it 
r(k)  = k + 1 ? 
Oh,  I see. 


Time  for  an  example.  Let’s  try  the  partial  sum  (5.114);  Gosper’s  method 
should  be  able  to  deduce  the  value  of 


for  any  fixed  n.  Ignoring  factors  that  don’t  involve  k,  we  want  the  sum  of 


t(k)  = F 


The  first  step  is  to  put  the  term  ratio  into  the  required  form  (5.117);  we  have 

t_(k+l)  _ (k-n)  p(k+1)q(k) 

t(k)  (k  + 1)  = p(k)r(k+l) 


so  we  simply  take  p(k)  _ 1 , q(k)  = k n,  and  r(k)  = k.  This  choice  of 
p,  q,  and  r satisfies  (5.118),  unless  n is  a negative  integer;  let’s  suppose  it 
isn’t.  According  to  (5.123),  we  should  consider  the  polynomials  Q(k)  = -n 
and  R(k)  = 2k  n.  Since  R has  larger  degree  than  Q,  we  need  to  look  at 
two  cases.  Either  d = deg(p)  — deg(R)  + 1,  which  is  0;  or  d = — 2|3/y  where 
|3  = -n  and  y = 2,  hence  d = n.  The  first  case  is  nicer,  so  let’s  try  it  first: 
Equation  (5.121)  is 


1 = (k  - n)ao  - koto 

and  so  we  choose  oto  = -1/n.  This  satisfies  the  required  conditions  and  gives 


cT(k) 


r(k)  s(k)  t(k) 
P(k) 

, (-D  n 

k- 

n ok 

k- 


which  is  the  answer  we  were  hoping  to  confirm. 

If  we  apply  the  same  method  to  find  the  indefinite  sum  )T  6k,  without 
the  (-1  )k  everything  will  be  almost  the  same  except  that  q(k)  will  be  n k; 
hence  Q(k)  _ n 2k  will  have  greater  degree  than  R(k)  = n,  and  we  will 
conclude  that  d has  the  impossible  value  deg(p)  — deg(Q)=  -1.  Therefore 
the  function  (£)  is  not  summable  in  hypergeometric  terms. 

However,  once  we  have  eliminated  the  impossible,  whatever  remains  — 
however  improbable-must  be  the  truth  (according  to  S.  Holmes  [70]).  When 
we  defined  p,  q,  and  r we  decided  to  ignore  the  possibility  that  n might  be  a 
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negative  integer.  What  if  it  is?  Let’s  set  Tl  = -N,  where  N is  positive.  Then 
the  term  ratio  for  ]T  (£)  6 k is 

t(k+  1)  _ -(k  + N)  p(k  + 1)  q(k) 
t(k)  " ( k + 1 ) = p(k)  r(k  + 1) 

and  it  should  be  represented  by  p(k)  = (k+  l)N_,,q(k)  — ■ 1 , r ( k ) — 1 . 

Gosper’s  method  now  tells  us  to  look  for  a polynomial  s(k)  of  degree  d = N -1; 
maybe  there’s  hope  after  all.  For  example,  when  N = 2 we  want  to  solve 

k+  1 = -((k  + 1)ai  + oto)  — (kan-  cxo]  . 

Equating  coefficients  of  k and  1 tells  us  that 


1 = -ai-ai;  1 = -ai  - ao  - ao; 
hence  s(k)  = — ^k"-|isa  solution,  and 


cT(k) 


1 ' (-2k  - t)  ’ ( k) 


- (-1 


|k-1 


2k  + 1 


k+1  ' 4 

Can  this  be  the  desired  sum?  Yes,  it  checks  out: 


( + 3 | l 2k  + 1 


(-1  )k(k  + 1 ) = 


4 4 

We  can  write  the  summation  formula  in  another  form. 


"Excellent, 
Holmes!" 
“Elementary,  my 
dear  Wa  tson.” 


M)k-' 


2k  + 1 
4 


m 

0 


(-D 


m-  1 


i 


This  representation  conceals  the  fact  that  ( is  summable  in  hypergeometric 
terms,  because  [111/2]  is  not  a hypergeometric  term. 

A catalog  of  summable  hypergeometric  terms  makes  a useful  addition 
to  the  database  of  hypergeometric  sums  mentioned  earlier  in  this  chapter. 
Let’s  try  to  compile  a list  of  the  sums-in-hypergeometric-terms  that  we  know. 
The  geometric  series  zk  6k  is  a very  special  case,  which  can  be  written 
zk  6k  = (z  - 1 )~’zk  + C or 


1 

z — 1 


F 


1,1 

1 


+ C. 


(5-124) 
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We  also  computed  ^ kzk  6k  in  Chapter  2.  This  summand  is  zero  when 
k = 0,  so  we  get  a more  suitable  hypergeometric  term  by  considering  the  sum 
(k  +1  )zk  6k  instead.  The  appropriate  formula  turns  out  to  be 


-1  /I +1/(1 -z),  1 \ 

O-Z)2  V V(1  -Z)  % 


(5-125) 


in  hypergeometric  notation. 

There’s  also  the  formula  Y_  (k)  6k  = equation  (5. 10);  we  write  it 

Y_  (k+((+1)  6k  = (k^+|1)  ’ to  avoid  division  by  zero,  and  get 


n+2, 1 
2 


')+ 


r / n + 2.  1 

n + 1 V 1 


Tl  / -1.  ( 5 . 1 2 6 ) 


Identity  (5.9)  turns  out  to  be  equivalent  to  this,  when  we  express  it  hyperge- 
ometrically. 

In  general  if  we  have  a summation  formula  of  the  form 


zl6k  = 


/ Ai , Am,  1 

V Bi , , , , , Bn 


then  we  also  have 


(5-127) 


/ Ai , . . . , Am, 
V Bi , . . . , Bn 


for  any  integer  l.  There’s  a general  formula  for  shifting  the  index  by  l: 


F 


a) . . . k,  Zl  p / Gl+l,  . . . , Qm+l,  1 \ 

b{.  - b^l!  V l^i  +1-)  • • - - Bn+l,  1+1  /<  ' 


Hence  any  given  identity  (5.127)  has  an  infinite  number  of  shifted  forms: 


y r /Qi  +k  • • • , dm  + 1) 1 

^ l b,  +1, . . . . bn+l 


isk 


_ bk . . . b^  . . . AlM  /Ai  +1, . . . , AM+1, 1 
a],  . . i.B(  . . . B^j  V Bi +1, . . . . BN  + 1 


(5-128) 


There’s  usually  a fair  amount  of  cancellation  among  the  a’s,  A’s,  b’s,  and 
B’s  here.  For  example,  if  we  apply  this  shift  formula  to  (5.126),  we  get  the 
general  identity 


LFfn+tli2,1  ]) 6k  = fn+il+i2,1  1) 

^ — \ 1+2  Jk  n + 1 y 1+1  )k 


(5-129) 
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valid  for  all  n i1  - 1 . The  shifted  version  of  (5.125)  is 


-1  1+  1/(1  -z)  /l+1+1/(l-z),l 

l-z  l+l  '"'v  l + l/(1-z) 


(5-i3o) 


With  a bit  of  patience,  we  can  compute  a few  more  indefinite  summation 
identities  that  are  potentially  useful: 


Q,  2 + (1— a)z/(l— z),  1 \ 

1 + (1— a)z/(l— z),  2 ZyK  ~ 

a,  b,  1 +(c  — ab)/(c  — a— b + 1),  1 
c + 1,  (c— ab)/(c— a— b + 1),  2 


1 

az  - 1 

l')+k 


F 


c 

ab  — c 


F 


y f ( Q>  b>  1 

\c  + 1,  a + b-c  + 1 


'\ik 


(c)(c  - b - a)  p ( a,  b,  1 
(c  ™ a)(c  — b)  \c,  a+b-c 


(5-i3i) 


(5-132) 


(5-133) 


Exercises 

Warmups 

1 What  is  1 l4  ? Why  is  this  number  easy  to  compute,  for  a person  who 
knows  binomial  coefficients? 

2 For  which  value(s)  of  k is  Q)  a maximum,  when  n is  a given  positive 
integer?  Prove  your  answer. 

3 Prove  the  hexagon  property,  (£:])  (k*,)  0+1)  = (V)  O (km)- 

4 Evaluate  (/))  by  negating  (actually  un-negating)  its  upper  index. 

5 Let  p be  prime.  Show  that  (£)  mod  p = 0 for  0 < k < p.  What  does  this 
imply  about  the  binomial  coefficients  (pk ') ? 

6 Fix  up  the  text’s  derivation  in  Problem  6,  Section  5.2,  by  correctly  ap-  A case  of 

plying  symmetry.  mistaken  identity. 

7 Is  (5.34)  true  also  when  k < 0? 
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(Here  t and  T 
aren’t  netessar- 
i I y related  as  i n 
(5-119).) 


8 Evaluate  (£)(— l)k(l  -k/n)“.  What  is  the  approximate  value  of  this 
sum,  when  n is  very  large?  Hint:  This  sum  is  An  f (0)  for  some  function  f. 

9 Show  that  the  generalized  exponentials  of  (5.58)  obey  the  law 

£t(z)  = fL(tz)lx , ift^O, 


where  £(z)  is  an  abbreviation  for  £,  (z). 

10  Show  that  — 2(ln(l  — z)  + z)/z2  is  a hypergeometric  function. 

11  Express  the  two  functions 


z3  z5  z7 

sinz  = 3!  + 5,- 7!  + 


1 -z3 

arcsinz  = z+  yy  + 12-4-!?  + ^ 

in  terms  of  hypergeometric  series. 

12  Which  of  the  following  functions  of  k is  a “hypergeometric  term,”  in  the 
sense  of  (5.115)?  Explain  why  or  why  not. 
a nk. 
b kn. 

c (k!  +(k+ l)!)/2. 
d Hk)  that  is,  { + \ + ■ . . + 

e t(k)T(n  k)/T(n),  when  t and  T are  hypergeometric  terms, 
f (t(k)  + T(k))/2,  when  t and  T are  hypergeometric  terms, 
g (at(k)  + bt(k+l)  + ct(k+2))/(a  + bt(l)  -)-  c t (2)) , when  t is  a 
hypergeometric  term. 


Basics 

13  Find  relations  between  the  superfactorial  function  Pn  = nk=i  k!  of  ex- 

ercise 4.55,  the  hyperfactorial  function  Qn  = nlc=i  ^1  arl(J  the  product 

Rn=nk=o(k)-" 

14  Prove  identity  (5.25)  by  negating  the  upper  index  in  Vandermonde’s  con- 
volution (5.22).  Then  show  that  another  negation  yields  (5.26). 

15  What  is  ^ (£)3(-1)k?  Hint:  See  (5.29). 

16  Evaluate  the  sum 


(-Dk 


when  a,  b,  c are  nonnegative  integers. 

17  Find  a simple  relation  between  (2n  n'/2)  and  (2n2iJ/2) . 
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18  Find  an  alternative  form  analogous  to  (5.35)  for  the  product 


r\  A-  V3\  (r  - 2/3 


1 9 Show  that  the  generalized  binomials  of  (5.58)  obey  the  law 
®t(z)  = . 


20  Define  a “generalized  bloopergeometric  series”  by  the  formula 


G 


L 

k^O 


k 

Clm 


■ bs  k! 


using  falling  powers  instead  of  the  rising  ones  in  (5.76).  Explain  how  G is 
related  to  F. 


21  Show  that  Euler’s  definition  of  factorials  is  consistent  with  the  ordinary 
definition,  by  showing  that  the  limit  in  (5.83)  is  1 j ((m  — 1)  . . . (1))  when 
Z = m is  a positive  integer. 

22  Use  (5.83)  to  prove  the  factorial  duplication  formula: 

x!  (X  ~ 1)!  = (2x)!  (— j)!/22x . 

23  What  is  the  value  of  F(—  n,  1;  ; 1 )? 

24  Find  ]Tk  (m^k)  by  using  hypergeometric  series. 

25  Show  that 


(dl  — bi ) F Q!\  ^2  j ' am 

\bi+1,  b2, . . . . bn 


= a,F 


di+1,  Q2,  . . . . a, 
bi+1,  bi, b„ 


. a,  1 

z)  — b,  F f ai’  a2’  ■' 

• > Q-m 

bn  1 

J V bi , b2,  . . 

• > bn 

Find  a similar  relation  between  the  hypergeometrics  F(  qj,  ct2,  d3  . . . , a,; 
bi,...,bn;z),  F(ai  + I , a2,  d3 . . . . am;bi , . . . , bn;z),  and  F(a1(a2+  1, 
a3 .. . , a,;  bi,. . . , bn; z). 

26  Express  the  function  G(z)  in  the  formula 

ai , . . . . am  ^ = 1 + G(z) 
bi,....bn  J 


By  the  way, 


as  a multiple  of  a hypergeometric  series. 
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27  Prove  that 


( fll,ai+2- 
\bi,  bi  + \ , 


...  a, , ctm+  2 
h h +2  - 

• i unT  2i  j 


(2 


m-n-1^2 


( 2a\ , . . . ,2am 
2 \ \2bi , . . . ,2bn 


( 2qi  , . . . ,2am 
\ 2bi , . . . , 2bn 


28  Prove  Euler’s  identity 


a,  b 


= (i  -z; 


iC-a-b  i 


c — a,  c— b 
c 


by  applying  Pfaff’s  reflection  law  (5.101)  twice. 
29  Show  that  confluent  hypergeometrics  satisfy 


30  What  hypergeometric  series  F satisfies  zF'(z)  + F(z)  = 1/(1  — z)? 

31  Show  that  if  f(k)  is  any  function  summable  in  hypergeometric  terms, 

then  f itself  is  a multiple  of  a hypergeometric  term.  In  other  words,  if 
^f(k)  6k  = cF(Ai, . . . . . , BN;  Z)k  + C,  then  there  exist  con- 

stants qi,  . . . , a„  bi, . . . , bn,  and  z such  that  f(k)  is  a constant  times 
F(  q-|  , . . . , a,;  bi bn;z)k. 

32  Find  ]T  k2  6k  by  Gosper’s  method. 

33  Use  Gosper’s  method  to  find  6 k/ ( k2  — 1). 

34  Show  that  a partial  hypergeometric  sum  can  always  be  represented  as  a 
limit  of  ordinary  hypergeometrics: 


= lim  F 


( -C,  Ql,  • 
\ E-C,  bi,  . 


when  c is  a nonnegative  integer.  Use  this  idea  to  evaluate  ]T  k<m  (k)  (-1  )k. 

Homework  exercises 

35  The  notation  (£)2k-n  is  ambiguous  without  context.  Evaluate  it 

a as  a sum  on  k; 
b as  a sum  on  n. 

36  Let  pk  be  the  largest  power  of  the  prime  p that  divides  (mr/rl) , when  m 
and  n are  nonnegative  integers.  Prove  that  k is  the  number  of  carries 
that  occur  when  m is  added  to  n in  the  radix  p number  system.  Hint: 
Exercise  4.24  helps  here. 
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31  Show  that  an  analog  of  the  binomial  theorem  holds  for  factorial  powers. 
That  is,  prove  the  identities 


(x+yp  = L(£)xky— - 

(x  + y)n  = Y_  (k  Vyn~k- 


for  all  nonnegative  integers  n. 

38  Show  that  all  nonnegative  integers  n can  be  represented  uniquely  in  the 
formn  = Cj1)  + (2)  + (3^  here  a>  b,  and  c are  integers  with  0 <1  a <b  < c. 
(This  is  called  the  binomial  number  system .) 

39  Show  that  if  xy  = ax  + by  then  xnyn  = (2nn  ^ k)  (anbn~kXk  + 

an  kbnyk)  for  all  n > 0.  Find  a similar  formula  for  the  more  general 
product  xmyn. 

40  Find  a closed  form  for 


X>ui+1 

1=1 


j + rk  + s\ 
TTV-j  )' 


integers  m,  Tl  0. 


41  Evaluate  (£)k!/(n  -f  1 + k)!  when  n is  a nonnegative  integer. 

42  Find  the  indefinite  sum  ((  -1  )x/  (£))  6x,  and  use  it  to  compute  the  sum 

Lk=o(-'|)V(k) in  closed  form- 

43  Prove  the  triple-binomial  identity  (5.28).  Hint:  First  replace  by 

i;,  (»;„.,)©• 

44  Use  identity  (5.32)  to  find  closed  forms  for  the  double  sums 


2j-1)i+lc 


()  + k\  /a\  /b\  /m  + n — j — k\ 

\ i Jli/wl  TTi-i  ) 


/m.  + n\ 

Vi  + kJ1 


and 


given  integers  m 1>  a 0 and  Tl  ^0. 

45  Find  a closed  form  for  ^k<n  (2]f)4~k- 

46  Evaluate  the  following  Slim  in  closed  form,  when  n is  a positive  integer: 


L 


2k  — 1 
k 


/4rt  — 2k  — 1 \ (-1)k-> 

V 2n  - k ) (2k  — 1 )(4n  — 2k  — 1 ) 


Hint:  Generating  functions  win  again. 
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4 7 The  sum  ^k  (rk|(f's)  (rnuIk  s)  is  a polynomial  in  r and  s.  Show  that  it 
doesn’t  depend  on  s. 

48  The  identity  ^k<n  (Unk)2~k  = 2n  can  be  combined  with  ^k>0  (n^k)zk  = 

1/(1  z)n+1  to  yield  ^Ik>n  (n+k)2~k  = 2”.  What  is  the  hypergeometric 

form  of  the  latter  identity? 

49  Use  the  hypergeometric  method  to  evaluate 


£(-l)k 


/x  + n - k\  y 
\ n — k ) y + n — k ’ 


50  Prove  Pfaff’s  reflection  law  (5.101)  by  comparing  the  coefficients  of  z11  on 
both  sides  of  the  equation. 

51  The  derivation  of  (5.104)  shows  that 


lime_o  F(-m,  - 1 + e;  -2m  + e;  2)  = 1/  ( ^2)  . 


In  this  exercise  we  will  see  that  slightly  different  limiting  processes  lead 
to  distinctly  different  answers  for  the  degenerate  hypergeometric  series 
F ( — m,  -2m  — 1;  -2m;  2). 

a Show  that  lime_,0  F(-m  + e,  -2m  — 1;  -2m  + 2e;  2)  = 0,  by  using 
Pfaff’s  reflection  law  to  prove  the  identity  F(a, -2m  1;  2a;  2)  = 0 

for  all  integers  m 1>  0. 

b What  is  lime^0  F(-m  + e,  -2m  - 1;  -2m  + e;  2) ? 

52  Prove  that  if  N is  a nonnegative  integer. 


• •b"F 


Nr  / C 1*1  > • • • ) Clm>  bl 


• Q 


bi, 

Ni 


) bn 

N / '1-bi-N,...  ,1-bn-N,-N 
~ZI  1— ai  -N, . . . , 1 — am— N 


(_1)m+n- 


53  If  we  put  b = —1  and  % = 1 in  Gauss’s  identity  (5.110),  the  left  side 

reduces  to  - 1 while  the  right  side  is  +1.  Why  doesn’t  this  prove  that 

■1  =+l? 

54  Explain  how  the  right-hand  side  of  (5.112)  was  obtained. 

55  If  the  hypergeometric  terms  t(k)  = F(ci]  , . . . , a,„;  bi,  ■ . , , bn;  z)k  and 

T(k)  = F(A1,...,Am;B1,...,Bn;Z)|c  satisfy  t(k)  = c(T(k+  1)  -T(k)) 

for  all  k )>  0,  show  that  z = Z and  m — n = M — N. 

56  Find  a general  formula  for  Y_  (^f)  6k  using  Gosper’s  method.  Show  that 

( — 1)k~'  [bylj  is  also  a solution. 
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57  Use  Gosper’s  method  to  find  a constant  0 such  that 


]T  (^)zk(k  + e)Sk 


is  summable  in  hypergeometric  terms. 

58  If  m and  n are  integers  with  0 ^ m ^ n,  let 


Tm,n  = 


m/  rt  - k 


Find  a relation  between  Tm  n and  Tm-i  ,n-i.  then  solve  your  recurrence 
by  applying  a summation  factor. 

Exam  problems 

5 9 Find  a closed  form  for 


l_logm  kj 


when  m and  n are  positive  integers. 

6 0 Use  Stirling’s  approximation  (4.23)  to  estimate  (mJ[n)  when  m and  n are 
both  large.  What  does  your  formula  reduce  to  when  m = n? 


6 1 Prove  that  when  p is  prime,  we  have 


n mod  p 
m mod  p 


(mod  p) , 


for  all  nonnegative  integers  m and  n. 


6 2 Assuming  that  p is  prime  and  that  m and  n are  positive  integers,  deter- 
mine the  value  of  mod  p2.  H int:  You  may  wish  to  use  the  following 
generalization  of  Vandermonde’s  convolution: 


k|  +k.2  + km  — ti 


W Vk2 


ri  + T2  + • • • + r„ 


6 3 Find  a closed  form  for 


Gr  + k 

t 2k 


given  an  integer  n 0. 
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64  Evaluate  ^ 

k=0 

65  Prove  that 


k+1 


t given  an  integer  n ^ 0. 


L{\  ^ k(k+1)!  = n- 

66  Evaluate  “Harry’s  double  sum,’ 


Z 

OsCj^k 


as  a function  of  m.  (The  sum  is  over  both  j and  k.) 
67  Find  a closed  form  for 


z 

k=0 


n 


integer  n 0. 


68  Find  a closed  form  for 
(ri 


Z 


min(k,  n — k) , integer  n ;>  0. 


69  Find  a closed  form  for 


mini 


ul 


ki  ?— "j 

ki  + ■•■+ Km=Tl 

as  a function  of  m and  n. 
70  Find  a closed  form  for 


Z 


n\/2k\/-l 


integer  n ^ 0. 


71  Fet 


s»  = Z(n+,Vk. 

to  Vm  + 2k; 


where  m and  ri  are  nonnegative  integers,  and  let  A(z)  = }Tk>0  QkZk  be 
the  generating  function  for  the  sequence  (qq,  al,  ai, . ■ ■)■ 

a Express  the  generating  function  S(z)  = ^n>0  Snzn  in  terms  of  A(z). 
b Use  this  technique  to  solve  Problem  7 in  Section  5.2. 
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72  Prove  that,  if  m,  n,  and  k are  integers  and  n > 0, 
^ ^ ^n2k“'v'(k|  is  an  integer, 


where  v(k)  is  the  number  of  l’s  in  the  binary  representation  of  k. 
73  Use  the  repertoire  method  to  solve  the  recurrence 


Xo  - a;  Xi  — (3 ; 

Xn  = (n  — 1 ) (Xn- 1 + Xn_2),  for  n > 1 


Hint:  Both  n!  and  nj  satisfy  this  recurrence. 

74  This  problem  concerns  a deviant  version  of  Pascal’s  triangle  in  which  the 
sides  consist  of  the  numbers  1 , 2,  3,  4,  . . . instead  of  all  l’s,  although  the 
interior  numbers  still  satisfy  the  addition  formula: 


2 2 
343 

4 7 7 4 


I I 


5 


(n 


11  14  11  5 

i<  lb  > 


If  ((£))  denotes  the  kth  number  in  row  n,  for  1 <C  k ^ n,  we  have 

O = O = n’  and  O = ((V))  + ((k  D)  for  1 < k < n-  Express 

the  quantity  ((£))  in  closed  form. 

75  Find  a relation  between  the  functions 


S"(u|  - E &) . 

'«>  . I ,)  . 

*<">  - L U 2) 

k 

and  the  quantities  [_2rL/3J  and  |"2n/3]. 

76  Solve  the  following  recurrence  for  n,  k ;>  0: 


Qn,0 

Q n,k 


= 1; 

— Qn-1,k 


Qo.k  = [k  = 0] ; 


T Qn— 1 ,k— 1 + 


for  n,  k > 0. 
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77  What  is  the  value  of 


if  m > 1 ? 


78  Assuming  that  m is  a positive  integer,  find  a closed  form  for 


r-  / kmodrn  \ 

\(2k  + 1)  mod  (2m  + 1 ) / 

79  a What  is  the  greatest  common  divisor  of  (2ti) , Gn- 1 ) ^ Hint: 

Consider  the  sum  of  these  n numbers, 
b Show  that  the  least  common  multiple  of  ((J) , , . . . , is  equal 

to  L(n  + 1)/(n  + 1),  where  L(n)  = lcm(l,2, . . . ,n). 

80  Prove  that  (£)  ^ (ert/k)k  for  all  integers  k, n ^ 0. 

81  If  0 < 0 < 1 and  0 <;  x <;  1,  and  if  l m,  n are  nonnegative  integers  with 
m < n,  prove  the  inequality 


Hint:  Consider  taking  the  derivative  with  respect  to  x. 

Bonus  problems 

82  Prove  that  Pascal’s  triangle  has  an  even  more  surprising  hexagon  prop- 
erty than  the  one  cited  in  the  text: 

^(GdMd,).  r;'))=  Hir).  si.  (a))  , 


if  0 < k < n.  For  example,  gcd(56,36,210)  = gcd(28, 120, 126]  = 2. 

83  Prove  the  amazing  identity  (5.32)  by  first  showing  that  it’s  true  whenever 
the  right-hand  side  is  zero. 

84  Show  that  the  second  pair  of  convolution  formulas,  (5.61),  follows  from 
the  first  pair,  (5.60).  Hint:  Differentiate  with  respect  to  z. 

85  Prove  that 


Ihp  L 

m=l  l$k,<k2<-<km$n 


^+ki2  + ---  + kim  + 2n^ 


(The  left  side  is  a sum  of  2n  — 1 terms.)  Hint:  Much  more  is  true 
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8 6 Let  al,  , a„  be  nonnegative  integers,  and  let  C(qi,.  . . , a„)  be  the 
coefficient  of  the  constant  term  z°  . . . z°  when  the  n(n  — 1)  factors 


1 


are  fully  expanded  into  positive  and  negative  powers  of  the  complex  vari- 
ables Z] , ■ ■ ■ ■ Zn. 

a Prove  that  C(qi  , . . . , a,)  equals  the  left-hand  side  of  (5.31). 
b Prove  that  if  Z],  ■ ■ , Zn  are  distinct  complex  numbers,  then  the 
polynomial 


fw  = I fl 


k=1  l$j$n 


Z — Zj 
Zk  Zj 


is  identically  equal  to  1. 

c Multiply  the  original  product  of  n(n  — 1)  factors  by  f (0)  and  deduce 
that  C(Qi,Q2,. . . ,an)  isequalto 

C(qi  — 1,  Q2 Qn)  + C(Qi,a2  - 1,..  . , Qn) 

+ ■ ■ •+C(a1,Q2)...,an-l). 


(This  recurrence  defines  multinomial  coefficients,  so  C(qi  , . . . , a,) 
must  equal  the  right-hand  side  of  (5.31).) 

87  Let  m be  a positive  integer  and  let  £ = enl/ m , Show  that 


zmk 


(1  + m)‘B_m(zm)  -m 

_ 5-  (C2i+1z^i+1/m(CZi+1z)1/m)n+1 
0$j<m  (m+i)21+1/m(c2i+iz)-i  , 


(This  reduces  to  (5.74)  in  the  special  case  m = 1.) 
88  Prove  that  the  coefficients  S]<  in  (5.47)  are  equal  to 


(-V 


e i(]  — e 


tik-l 


dt 
t ’ 


for  all  k > 1;  hence  )sk|  <l/(k  — 1). 
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89  Prove  that  (5.19)  has  an  infinite  counterpart, 

Z (m^T)xkym'k  = Z (kT)  (-x)k(x+y)m-k,  integer  m, 
k>m  V ' k>mk 

if  |x|  < lii  and  |x|  < |x  + yj,  Differentiate  this  identity  n times  with 
respect  to  y and  express  it  in  terms  of  hypergeometrics;  what  relation  do 
you  get? 

90  Problem  1 in  Section  5.2  considers  ^k>c  (£)  / (£)  when  r and  s are  integers 
with  s r 0.  What  is  the  value  of  this  sum  if  r and  s aren’t  integers? 

91  Prove  Whipple's  identity, 


( Ia'  i*+*.  !-a- 

\ 1+Q  — b,  1+Q  — 


b - c 
c 


— 4z 


= (1  -*)aF 


n-z)2 

a,  b,  c 

1 +a— b,  1 +a— c 


by  showing  that  both  sides  satisfy  the  same  differential  equation. 
92  Prove  Clausen’s  product  identities 


FUb+1i‘): 


F ( l + a>  l+b 

1 +a+b 


2 F 


P / 2a,  a+b,  2b 

\2a+2b,  a + b+  j 

= p/1,  $ + a-b , 1-a+b 
V 1 +a+b,  1 - a - b 

What  identities  result  when  the  coefficients  of  zn  on  both  sides  of  these 
formulas  are  equated? 

93  Show  that  the  indefinite  sum 

L(n(f0)  + «)/nf0)) 6k 

has  a (fairly)  simple  form,  given  any  function  f and  any  constant  a. 

94  Show  that  if  w = e27ri^3  we  have 

1 2 , , / 4n  \ 

integer  n ^ 0. 


y ( 3n 

L-  \k,  l,  m 


k+l-f  m=3n 


CU 


l+2m 


4n 

n,n,2nj  ’ 


242  BINOMIAL  COEFFICIENTS 


Research  problems 

95  Let  q(n)  be  the  smallest  odd  prime  factor  of  the  middle  binomial  co- 

efficient (2^).  According  to  exercise  36,  the  odd  primes  p that  do  not 
divide  (2^)  are  those  for  which  all  digits  in  n’s  radix  p representation  are 
(p  — 1)/2  or  less.  Computer  experiments  have  shown  that  q(n)  <C  11  for 
all  n < lO10000,  except  that  q(3160)  = 13. 

a Is  q(n)  ^ 11  for  all  n > 3160? 
b Is  q(n)  =11  for  infinitely  many  n? 

A reward  of  $('^)  Q (3)  is  offered  for  a solution  to  either  (a)  or  (b). 

96  Is  (2^)  divisible  by  the  square  of  a prime,  for  all  n > 4? 

97  For  what  values  of  n is  (2^)  = (-1)”  (mod  (2n  + 1))? 


Special  Numbers 


SOME  SEQUENCES  of  numbers  arise  so  often  in  mathematics  that  we  rec- 
ognize them  instantly  and  give  them  special  names.  For  example,  everybody 
who  learns  arithmetic  knows  the  sequence  of  square  numbers  (1,4,9,16,  . . ). 
In  Chapter  1 we  encountered  the  triangular  numbers  (1 , 3, 6, 10,  . . . );  in  Chap- 
ter 4 we  studied  the  prime  numbers  (2, 3, 5,  7,.  • . );  in  Chapter  5 we  looked 
briefly  at  the  Catalan  numbers  (1,2,5, 14,  . . . ), 

In  the  present  chapter  we’ll  get  to  know  a few  other  important  sequences. 
First  on  our  agenda  will  be  the  Stirling  numbers  {£}  and  , and  the  Eulerian 
numbers  ^);  these  form  triangular  patterns  of  coefficients  analogous  to  the 
binomial  coefficients  in  Pascal’s  triangle.  Then  we’ll  take  a good  look 
at  the  harmonic  numbers  H„  and  the  Bernoulli  numbers  Bn;  these  differ 
from  the  other  sequences  we’ve  been  studying  because  they’re  fractions,  not 
integers.  Finally,  we’ll  examine  the  fascinating  Fibonacci  numbers  Fn  and 
some  of  their  important  generalizations. 

6.1  STIRLING  NUMBERS 

We  begin  with  some  close  relatives  of  the  binomial  coefficients,  the 
Stirling  numbers,  named  after  James  Stirling  (1692-1770).  These  numbers 
come  in  two  flavors,  traditionally  called  by  the  no-frills  names  “Stirling  num- 
bers of  the  first  and  second  kind!’  Although  they  have  a venerable  history 
and  numerous  applications,  they  still  lack  a standard  notation.  We  will  write 
{£}  for  Stirling  numbers  of  the  second  kind  and  [£]  for  Stirling  numbers  of 
the  first  kind,  because  these  symbols  turn  out  to  be  more  user-friendly  than 
the  many  other  notations  that  people  have  tried. 

Tables  244  and  245  show  what  and  j^j  look  like  when  n and  k are 
small.  A problem  that  involves  the  numbers  “1,  7,  6,  1”  is  likely  to  be  related 
to  {£},  and  a problem  that  involves  “6,  11,  6,  1”  is  likely  to  be  related  to 
[k]  > Just  as  we  assume  that  a problem  involving  “1,  4,  6,  4,  1”  is  likely  to  be 
related  to  (™);  these  are  the  trademark  sequences  that  appear  when  n = 4. 
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Stirling  numbers  of  the  second  kind  show  up  more  often  than  those  of 
the  other  variety,  so  let’s  consider  last  things  first.  The  symbol  {£}  stands  for 
the  number  of  ways  to  partition  a set  of  n things  into  k nonempty  subsets. 
For  example,  there  are  seven  ways  to  split  a four-element  set  into  two  parts: 


(Stirling  himself 
considered  this 
kind  hrst  in  his 
book  [281].) 


{1.2,3}  U {4} , {1 ,2,4}  U {3} , {1 ,3,4}  u {2} , {2,3,4}U{1}, 

{1 , 2}  U {3, 4} , {1,3}  U {2,4},  {1 ,4}  U {2,3} ; (6.i) 


thus  {*5}  = 7.  Notice  that  curly  braces  are  used  to  denote  sets  as  well  as 
the  numbers  {£}  . This  notational  kinship  helps  us  remember  the  meaning  of 
{£},  which  can  be  read  “n  subset  k." 

Let’s  look  at  small  k.  There’s  just  one  way  to  put  n elements  into  a single 
nonempty  set;  hence  { = 1,  for  all  n > 0.  On  the  other  hand  {°}  = 0, 

because  a 0-element  set  is  empty. 

The  case  k = 0 is  a bit  tricky.  Things  work  out  best  if  we  agree  that 
there’s  just  one  way  to  partition  an  empty  set  into  zero  nonempty  parts;  hence 
{®}  = 1.  But  a nonempty  set  needs  at  least  one  part,  so  {)}}  = 0 for  n > 0. 

What  happens  when  k =:  2?  Certainly  {°}  = 0.  If  a set  of  n > 0 objects 
is  divided  into  two  nonempty  parts,  one  of  those  parts  contains  the  last  object 
and  some  subset  of  the  first  n — 1 objects.  There  are  2n_1  ways  to  choose  the 
latter  subset,  since  each  of  the  first  n — 1 objects  is  either  in  it  or  out  of  it; 
but  we  mustn’t  put  all  of  those  objects  in  it,  because  we  want  to  end  up  with 
two  nonempty  parts.  Therefore  we  subtract  1: 


integer  n > 0. 


(6.2) 


(This  tallies  with  our  enumeration  of  {^}  = ! = ']}  1 ways  above.) 
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Table  245  Stirling’s  triangle  for  cycles. 
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36 

A modification  of  this  argument  leads  to  a recurrence  by  which  we  can 
compute  {£}  for  all  k:  Given  a set  of  n >0  objects  to  be  partitioned  into  k 
nonempty  parts,  we  either  put  the  last  object  into  a class  by  itself  (in  {£“]} 
ways),  or  we  put  it  together  with  some  nonempty  subset  of  the  first  n — 1 
objects.  There  are  lc{ Tlj^ 1 } possibilities  in  the  latter  case,  because  each  of  the 
{ nk1}  ways  to  distribute  the  first  n 1 objects  into  k nonempty  parts  gives 
k subsets  that  the  nth  object  can  join.  Hence 

{1}  = k{V}  + {*-l}’  integer  n > <6-3) 

This  is  the  law  that  generates  Table  244;  without  the  factor  of  k it  would 
reduce  to  the  addition  formula  (5.8)  that  generates  Pascal’s  triangle. 

And  now,  Stirling  numbers  of  the  first  kind.  These  are  somewhat  like 
the  others,  but  [£]  counts  the  number  of  ways  to  arrange  n objects  into  k 
cycles  instead  of  subsets.  We  verbalize  ‘ [£]’  by  saying  “n  cycle  k!’ 

Cycles  are  cyclic  arrangements,  like  the  necklaces  we  considered  in  Chap- 
ter 4.  The  cycle 

D B 

can  be  written  more  compactly  as  ‘[A,  B,  C,  D] with  the  understanding  that 

[A,  B,  C,  D]  = [B,  C,D,  A]  = [C,  D,  A,  B]  = [D,A,B,C] ; 

a cycle  “wraps  around”  because  its  end  is  joined  to  its  beginning.  On  the  other 
hand,  the  cycle  [A,  B,  C,  D]  is  not  the  same  as  [A,  B,  D,  C]  or  [D,  C,  B,  A]. 
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There  are  eleven  different  ways  to  make  two  cycles  from  four  elements: 

[1,2,3]  [4] , [1-2,4]  [3],  [1,3,4]  [2],  [2,3,4]  [1], 

[1.3.2]  [4] , [1,4,2]  [3],  [1,4,3]  [2] , [2,4,3]  [1], 

[1.2]  [3,4],  [1,3]  [2, 4],  [1,4]  [2,3] ; (6.4) 

hence  [2]  =11. 

A singleton  cycle  (that  is,  a cycle  with  only  one  element)  is  essentially 
the  same  as  a singleton  set  (a  set  with  only  one  element).  Similarly,  a 2-cycle 
is  like  a 2-set,  because  we  have  [A,  B]  = [B,  A]  just  as  {A,  B)  = {B,  A}.  But 
there  are  two  different  3-cycles,  [A,  B,  C]  and  [A,  C,  B].  Notice,  for  example, 
that  the  eleven  cycle  pairs  in  (6.4)  can  be  obtained  from  the  seven  set  pairs 
in  (6.1)  by  making  two  cycles  from  each  of  the  3-element  sets. 

In  general,  n!/n  = (n  — 1)  ! cycles  can  be  made  from  any  n-element  set, 
whenever  n > 0.  (There  are  n!  permutations,  and  each  cycle  corresponds 
to  tl  of  them  because  any  one  of  its  elements  can  be  listed  first.)  Therefore 
we  have 

™ = ( n - 1 ) ! , integer  n > 0.  (6.5) 


This  is  much  larger  than  the  value  {"}  = 1 we  had  for  Stirling  subset  numbers. 
In  fact,  it  is  easy  to  see  that  the  cycle  numbers  must  be  at  least  as  large  as 
the  subset  numbers, 


integers  n,  k 0, 


(6.6) 


because  every  partition  into  nonempty  subsets  leads  to  at  least  one  arrange- 
ment of  cycles. 

Equality  holds  in  (6.6)  when  all  the  cycles  are  necessarily  singletons  or 
doubletons,  because  cycles  are  equivalent  to  subsets  in  such  cases.  This  hap- 
pens when  k = n and  when  k = n — 1;  hence 


nl 

_ J 

I'M. 

n 

_ j 

r * 1 

n- 

“ 1 

IM’ 

n -1 

- 1 

In  — lj 

In  fact,  it  is  easy  to  see  that. 


n 

n 


' n 1 

-/  "■  1 

n-L 

-\n-lj 

(6-7) 


(The  number  of  ways  to  arrange  n objects  into  n 1 cycles  or  subsets  is 
the  number  of  ways  to  choose  the  two  objects  that  will  be  in  the  same  cycle 
or  subset.)  The  triangular  numbers  Q)  = 1,  3,  6,  10,  . . . are  conspicuously 
present  in  both  Table  244  and  Table  245. 


‘There  are  nine 
and  sixty  ways 
of  constructing 
tribal  lays, 

And-every-single- 

one-of-them-is- 

right.” 

-Rudyard  Kipling 
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We  can  derive  a recurrence  for  [£]  by  modifying  the  argument  we  used 
for  {£}.  Every  arrangement  of  n objects  in  k cycles  either  puts  the  last  object 
into  a cycle  by  itself  (in  [£“|]  waySjor  inserts  that  object  into  one  of  the  p1”1] 
cycle  arrangements  of  the  first  n-  1 objects.  In  the  latter  case,  there  are  n-  1 
different  ways  to  do  the  insertion.  (This  takes  some  thought,  but  it’s  not  hard 
to  verify  that  there  are  j ways  to  put  a new  element  into  a j -cycle  in  order  to 
make  a (j  + l)-cycle.  When  j = 3,  for  example,  the  cycle  [A,  B,  C]  leads  to 


[A,  B,  C,  D], 


[A,B,D,C],o  r [A.D.B.C] 


when  we  insert  a new  element  D,  and  there  are  no  other  possibilities.  Sum- 
ming over  all  j gives  a total  of  n-  1 ways  to  insert  an  nth  object  into  a cycle 
decomposition  of  n — 1 objects.)  The  desired  recurrence  is  therefore 


n 

k 


(n  — 1) 


n-  1 
k 


n-  1 
k-  1 


integer  n > 0. 


(6.8) 


This  is  the  addition-formula  analog  that  generates  Table  245. 

Comparison  of  (6.8)  and  (6.3)  shows  that  the  first  term  on  the  right  side  is 
multiplied  by  its  upper  index  (n-  1)  in  the  case  of  Stirling  cycle  numbers,  but 
by  its  lower  index  k in  the  case  of  Stirling  subset  numbers.  We  can  therefore 
perform  “absorption”  in  terms  like  n[£]  and  k{  £},  when  we  do  proofs  by 
mathematical  induction. 

Every  permutation  is  equivalent  to  a set  of  cycles.  For  example,  consider 
the  permutation  that  takes  123456789  into  384729156.  We  can  conveniently 
represent  it  in  two  rows, 


123456789 

384729156, 


showing  that  1 goes  to  3 and  2 goes  to  8,  etc.  The  cycle  structure  comes 
about  because  1 goes  to  3,  which  goes  to  4,  which  goes  to  7,  which  goes  back 
to  1;  that’s  the  cycle  [1,3, 4,7].  Another  cycle  in  this  permutation  is  [2,8,5]; 
still  another  is  [6,9],  Therefore  the  permutation  384729156  is  equivalent  to 
the  cycle  arrangement 

[1,3, 4, 7]  [2, 8, 5]  [6, 9], 

If  we  have  any  permutation  7T|  7T2  • ■ ■ 7Tn  of  { 1 , 2, , . . , n),  every  element  is  in  a 
unique  cycle.  For  if  we  start  with  TTlo  = nr  and  look  at  ITli  = 7Tmo 1 TTI2  = 7Imi , 
etc.,  we  must  eventually  come  back  to  m*  = mo.  (The  numbers  must  re- 
peat sooner  or  later,  and  the  first  number  to  reappear  must  be  TTlo  because 
we  know  the  unique  predecessors  of  the  other  numbers  ml,  m.2,  • ■ • , Th-k-i  •) 
Therefore  every  permutation  defines  a cycle  arrangement.  Conversely,  every 
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cycle  arrangement  obviously  defines  a permutation  if  we  reverse  the  construc- 
tion, and  this  one-to-one  correspondence  shows  that  permutations  and  cycle 
arrangements  are  essentially  the  same  thing. 

Therefore  [£]  is  the  number  of  permutations  of  n objects  that  contain 
exactly  k cycles.  If  we  sum  [£]  over  all  k,  we  must  get  the  total  number  of 
permutations: 


n 


n ! , integer  n 0. 


(6-9) 


For  example,  6+11+6+1=  24  =4!. 

Stirling  numbers  are  useful  because  the  recurrence  relations  (6.3)  and 
(6.8)  arise  in  a variety  of  problems.  For  example,  if  we  want  to  represent 
ordinary  powers  xn  by  falling  powers  x— , we  find  that  the  first  few  cases  are 


x3  = x-  + 3x-  + x- ; 
x4  = x-  + 6x-  + 7x-  + x- . 


These  coefficients  look  suspiciously  like  the  numbers  in  Table  244,  reflected 
between  left  and  right;  therefore  we  can  be  pretty  confident  that  the  general 
formula  is 


n v—  fn|  k . . _ , We’d  better  define 

X"  = 2_  \lc/X"’  integern^°-  (6l°)  |n}=pp  = () 

when  k < 0 and 

And  sure  enough,  a simple  proof  by  induction  clinches  the  argument:  We  Tl^O. 
have  X • x-  = x—  + kx-,  because  X—  = x-  (x  - k)  ; hence  X ■ Xn_1  is 


■L 


n-  1 
k 


kx- 


n-  1 

k 


- Lb 


n-  1 

k 


+ 


n-  1 

k-  1 


In  other  words,  Stirling  subset  numbers  are  the  coefficients  of  factorial  powers 
that  yield  ordinary  powers. 
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We  can  go  the  other  way  too,  because  Stirling  cycle  numbers  are  the 
coefficients  of  ordinary  powers  that  yield  factorial  powers: 


x3  + 3x2  + 2x'  ; 
x4  +6x3  + 11x2  +6X1  . 

We  have  (x+n-  1 ) -xk  = xk+1  + (n  — 1 
shows  that 

(x  + n— l)x^  = (x  + n - 1 ) y 

k 

This  leads  to  a proof  by  induction  of  the  general  formula 


)xk,  so  a proof  like  the  one  just  given 


n-T 

k 


L 


n 

k 


integer  n ^ 0. 


(6.11) 


(Setting  x = 1 gives  (6.9)  again.) 

But  wait,  you  say.  This  equation  involves  rising  factorial  powers  x11,  while 
(6.10)  involves  falling  factorials  x— . What  if  we  want  to  express  x—  in  terms  of 
ordinary  powers,  or  if  we  want  to  express  xn  in  terms  of  rising  powers?  Easy; 
we  just  throw  in  some  minus  signs  and  get 


x— 


j(-ir-kx\ 

integer  n ^ 0; 

(6.12) 

(_1)U-kxki 

integer  n ^ 0. 

(6-13) 

This  works  because,  for  example,  the  formula 

x-  = x(x  — 1 )(x  — 2)(x  — 3)  = x4  — 6x3  + 1 lx2  — 6x 
is  just  like  the  formula 

x4  = x(x  + 1)(x  + 2)(x  + 3)  = x4  +6x3  + 1 1 x2  +6x 

but  with  alternating  signs.  The  general  identity 

xn  = (-in-x)W 


(6.14) 


of  exercise  2.17  converts  (6.10)  to  (6.12)  and  (6.11)  to  (6.13)  if  we  negate  x. 
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Table  250  Basic  Stirling  number  identities,  for  integer  n 0. 


Recurrences: 


k = 


nr’Uln_: 


tx-II  rn— 1 

+ , 


Special  values: 


n n 


0 = 0 


[n  = 0] 


[n  > 0] 


= (2n_1  -1)  [n  > 0] ; 


n - 1 n — 1 


n n n 


u n Via 


n n 


(n  — 1 )!  [n > 0] . 


(rt  — 1 )!  Hn_i  [n  > 0] 


0 , if  k > n. 


Converting  between  powers: 


l)n-kxk. 


= Y n (— 1 )n~kxk  ; 

* — V 


Inversion  formulas: 


(-l)n-k  = [m  = n] : 


[m  = n] . 
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Table  251  Additional  Stirling  number  identities,  for  integers  1,  m,  n^O. 


(6.15) 
|6,  16| 
(6.17) 
|6,18| 
(619) 
|6,20| 

16,21) 

16,22) 

16.23) 

16.24) 

16.25) 

16.26) 

16.27) 

16.28) 

(6-29) 
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We  can  remember  when  to  stick  the  (— 1)n  k factor  into  a formula  like 
(6.12)  because  there’s  a natural  ordering  of  powers  when  x is  large: 

xii  > xn  > x— , for  all  x > n > 1.  (6  30) 

The  Stirling  numbers  [£]  and  are  nonnegative,  so  we  have  to  use  minus 
signs  when  expanding  a “small”  power  in  terms  of  “large”  ones. 

We  can  plug  (6.ll)  into  (6.12)  and  get  a double  sum: 


This  holds  for  all  x,  so  the  coefficients  of  x°,  x1,  . . . , Xn  \ xn+1,  xn+2,  . . on 
the  right  must  all  be  zero  and  we  must  have  the  identity 

Y_  jw  ^ M)n~k  -=  [m  = n] , integers  m, n ^ 0.  (6.31) 

Stirling  numbers,  like  binomial  coefficients,  satisfy  many  surprising  iden- 
tities. But  these  identities  aren’t  as  versatile  as  the  ones  we  had  in  Chapter  5, 
so  they  aren’t  applied  nearly  as  often.  Therefore  it’s  best  for  us  just  to  list 
the  simplest  ones,  for  future  reference  when  a tough  Stirling  nut  needs  to  be 
cracked.  Tables  250  and  251  contain  the  formulas  that  are  most  frequently 
useful;  the  principal  identities  we  have  already  derived  are  repeated  there. 

When  we  studied  binomial  coefficients  in  Chapter  5,  we  found  that  it 
was  advantageous  to  define  (£)  for  negative  n in  such  a way  that  the  identity 
(£)  = (nk1)  + ( [!  ] ) is  valid  without  any  restrictions.  Using  that  identity  to 
extend  the  (£)’s  beyond  those  with  combinatorial  significance,  we  discovered 
(in  Table  164)  that  Pascal’s  triangle  essentially  reproduces  itself  in  a rotated 
form  when  we  extend  it  upward.  Let’s  try  the  same  thing  with  Stirling’s 
triangles:  What  happens  if  we  decide  that  the  basic  recurrences 


are  valid  for  all  integers  n and  k?  The  solution  becomes  unique  if  we  make 
the  reasonable  additional  stipulations  that 


{°}  = [k]  = |k=01  “d  {0}  = [0]  = ln=01-  (632) 
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Table  253  Stirling’s  triangles  in  tandem. 


In  fact,  a surprisingly  pretty  pattern  emerges:  Stirling’s  triangle  for  cycles 
appears  above  Stirling’s  triangle  for  subsets,  and  vice  versa!  The  two  kinds 
of  Stirling  numbers  are  related  by  an  extremely  simple  law: 

integers  k,n.  (6-33) 

We  have  “duality,”  something  like  the  relations  between  min  and  max,  between 
[xj  and  |"x],  between  x—  and  xn,  between  gcd  and  1cm.  It’s  easy  to  check  that 
both  of  the  recurrences  [£]  = (n-  1 ) [\ ’]  + [jjl]]  and  { k}  = k{\ 1 } + {£:] } 

amount  to  the  same  thing,  under  this  correspondence. 
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(Knuth  [175,  first 
edition]  used 


(The  first  row  lists  the  permutations  with  714  < 7X2  > 7Xj  < 7T4 ; the  second  row 
lists  those  with  71;  < 712  < 7Xj  > 7X4  and  7X4  > 7T2  < 7X3  < 714.)  Hence  (2)  = 1 1 ■ 


Another  triangle  of  values  pops  up  now  and  again,  this  one  due  to 
Euler  [88,  page  485],  and  we  denote  its  elements  by  (^}.  The  angle  brackets 
in  this  case  suggest  “less  than”  and  “greater  than”  signs;  is  the  number  of 
permutations  7X1  7I2  . . . 7In  of  {1 ,2, . . . ,n}  that  have  k ascents,  namely,  k places 
where  7tj  < 7Xj+i.  (Caution:  This  notation  is  even  less  standard  than  our  no- 
tations [£] , {k}  for  Stirling  numbers.  But  we’ll  see  that  it  makes  good  sense.) 

For  example,  eleven  permutations  of  {1  ,2,3,4}  have  two  ascents: 

1324,  1423,  2314,  2413,  3412; 

1243,  1342,  2341;  2134,  3124,  4123. 
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Table  254  Euler’s  triangle. 


Table  254  lists  the  smallest  Eulerian  numbers;  notice  that  the  trademark 
sequence  is  1 , 11,  11,  1 this  time.  There  can  be  at  most  n — 1 ascents,  when 
n > 0,  so  we  have  = [n  = 0]  on  the  diagonal  of  the  triangle. 

Euler’s  triangle,  like  Pascal’s,  is  symmetric  between  left  and  right.  But 
in  this  case  the  symmetry  law  is  slightly  different: 


integer  n>  0; 


(6-34) 


The  permutation  n\  Tii  . . . 7ln  has  n-  1 -k  ascents  if  and  only  if  its  “reflection” 
7Tn , . . 71271]  has  k ascents. 

Let’s  try  to  find  a recurrence  for  (£).  Each  permutation  p = pi . . . pn_i 
of  {1 , . . . , n — 1 } leads  to  n permutations  of  {1 , 2, . . . , n}  if  we  insert  the  new 
element  n in  all  possible  ways.  Suppose  we  put  n in  position  j,  obtaining  the 
permutation  71  = pi . . . Pj-iTiPj.  ■ ■ pn-i-  The  number  of  ascents  in  7t  is  the 
same  as  the  number  in  p,  if  j = 1 or  if  pj_  1 < Pj;  it’s  one  greater  than  the 
number  in  p,  if  Pj-i  > Pj  or  if  j = n.  Therefore  7t  has  k ascents  in  a total  of 
(k+  l)(n^1)w4rs  from  permutations  p that  have  k ascents,  plus  a total  of 
((n  — 2)  — (k— 1)  + l)(£l])  ways  from  permutations  p that  have  k-  1 ascents. 
The  desired  recurrence  is 


(k+  l)^nk  ^ > inteSer  n > 0.  (6.35) 


Once  again  we  start  the  recurrence  off  by  setting 


= [k  = 0] , 


integer  k. 


and  we  will  assume  that  (£)  = 0 when  k < 0. 


(6.36) 
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Eulerian  numbers  are  useful  primarily  because  they  provide  an  unusual 
connection  between  ordinary  powers  and  consecutive  binomial  coefficients: 

xn  = X^k)(Xnk)’  integer  rt  ^ 0,  (6-37) 

(This  is  “Worpitzky’s  identity”  [308].)  For  example,  we  have 


and  so  on.  It’s  easy  to  prove  (6.37)  by  induction  (exercise  14). 

Incidentally,  (6.37)  gives  us  yet  another  way  to  obtain  the  sum  of  the 
first  n squares:  We  have  k2  = (q)^)  + (f)  d')  = (2)  + d')’  hence 


12+22  + .--+nJ  . (G)  + G)  + -+(5))  + ((i)  + @+-  + (T)) 

= (T)  + (7)  = ?(n  + 1)n((n  - 1 ) + (n  + 2)) . 

The  Eulerian  recurrence  (6.35)  is  a bit  more  complicated  than  the  Stirling 
recurrences  (6.3)  and  (6.8),  so  we  don’t  expect  the  numbers  (k)  to  satisfy  as 
many  simple  identities.  Still,  there  are  a few: 


(:) 

= £_  (n^ + 1 -h)n(-i)k; 

(6.38) 

k=0  k ' 

(6-39) 

(:) 

(6.40) 

multiply 

' (6.39)  by  and  sum  on  m,  we 

get  = 

(£)  (z  + 1 ) k.  Replacing  z by  z 1 and  equating  coefficients  of  zk  gives 
(6.40).  Thus  the  last  two  of  these  identities  are  essentially  equivalent.  The 
first  identity,  (6.38),  gives  us  special  values  when  m is  small: 


2n— n— 1 ; 


3n  — (n+1  )2n  + 
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We  needn’t  dwell  further  on  Eulerian  numbers  here;  it’s  usually  sufficient 
simply  to  know  that  they  exist,  and  to  have  a list  of  basic  identities  to  fall 
back  on  when  the  need  arises.  However,  before  we  leave  this  topic,  we  should 
take  note  of  yet  another  triangular  pattern  of  coefficients,  shown  in  Table  256. 
We  call  these  “second-order  Eulerian  numbers”  ((£)),  because  they  satisfy  a 
recurrence  similar  to  (6.35)  but  with  n replaced  by  2n  1 in  one  place: 

These  numbers  have  a curious  combinatorial  interpretation,  first  noticed  by 
Gessel  and  Stanley  [118]:  If  we  form  permutations  of  the  multiset  {1, 1,2,2, 

. , n,  n}  with  the  special  property  that  all  numbers  between  the  two  occur- 
rences of  m are  greater  than  m,  for  1 m <C  n,  then  ((y))  is  the  number  of 
such  permutations  that  have  k ascents.  For  example,  there  are  eight  suitable 
single-ascent  permutations  of  {1 , 1 , 2, 2, 3, 3}: 

113322,  1 33  2 2 1,  2 2 13  3 1,  2 2 113  3,  2 2 3 3 1 1,  2 3 3 2 1 1,  3 3 1 1 2 2,  3 3 1 2 2 1, 

Thus  ((^))  = 8.  The  multiset  {1,  1,2,2,.  .,  n,n}  has  a total  of 

L((k))  = (2n-l)(2a-3)...(I)  = ^ (642) 

suitable  permutations,  because  the  two  appearances  of  n must  be  adjacent 
and  there  are  2n  — 1 places  to  insert  them  within  a permutation  for  n — 1 . 
For  example,  when  n = 3 the  permutation  12  21  has  five  insertion  points, 
yielding  3 3 12  2 1,  1 33  2 2 1,  12  3 3 2 1,  1 2 23  3 1,  and  1 2 2 13  3,  Recurrence  (6.41)  can 

be  proved  by  extending  the  argument  we  used  for  ordinary  Eulerian  numbers. 
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Second-order  Eulerian  numbers  are  important  chiefly  because  of  their 
connection  with  Stirling  numbers  [119]:  We  have,  by  induction  on  n, 


(We  already  encountered  the  case  n = 1 in  (6.7).)  These  identities  hold 
whenever  x is  an  integer  and  n is  a nonnegative  integer.  Since  the  right-hand 
sides  are  polynomials  in  x,  we  can  use  (6.43)  and  (6.44)  to  define  Stirling 
numbers  { x*n}  and  [x*n]  for  arbitrary  real  (or  complex)  values  of  x. 

If  n > 0,  these  polynomials  j x/n}  and  [ are  zero  when  x = 0,  x = 1 , 

. . . , and  x = n;  therefore  they  are  divisible  by  (x-O),  (x-1),  . . . , and  (x-n). 
It’s  interesting  to  look  at  what’s  left  after  these  known  factors  are  divided  out 
We  define  the  Stirling  polynomials  cn(x)  by  the  rule 


on(x] 


* / ( X(  X- 1 ) . . . ( X- TX) ) . 


(6-45) 


(The  degree  of  o,(x)  is  n 1.  ) The  first  few  cases  are 


So  1 /x  is  a 
polynomial? 

(Sorry  about  that.) 


They  can  be  computed  via  the  second-order  Eulerian  numbers;  for  example, 

o3(x)  = ((x-4)(x-5)  + 8(x+1)(x-4)  + 6(x+2)(x+1  ))/6! . 


oo(x)  = I / x; 
ay  (x)  = 1/2; 

02  (x)  = (3x  — 1)/24; 

03  (x)  = (x2  - x ) / 4 8 ; 

o4(x)  = (15x3  - 30x2  + 5x  + 2)/5760 . 


258  SPECIAL  NUMBERS 


Table  258  Stirling  convolution  formulas. 


Tl 


TS  X,  ffri-k(s) 

k=0 

TL 

= (r  + s)crn(r  + s) 

(6.46) 

s Y_  kcJk(r)  CTn-k(s): 

k=0 

n 

= ruTn(r+  s) 

(6-47) 

TS  ffk(T  + k)  CTu-k(s  - n - k) 

k=0 
n 

= (r  + s)crn(T  + s + n) 

(6.48) 

kcrk(r  + k)  crn_k(sH-  n-k) 

k=0 

= ncrn(r  + s + n) 

(6-49) 

{;}  ■ <-')■ 

~m+1  n'  rr  1 ml 

(6-50) 

(m—  1)!  n-m( 

n]  nil 

m j (m - ’ 

■yO’n-ml'l) 

(6.51) 

It  turns  out  that  these  polynomials  satisfy  two  very  pretty  identities: 


x^ffu(x)zn; 

n$0 


= x^ffn(x  + rt)zn; 

n>0 


(6-52) 

(6-53) 


Therefore  we  can  obtain  general  convolution  formulas  for  Stirling  numbers,  as 
we  did  for  binomial  coefficients  in  Table  202;  the  results  appear  in  Table  258. 
When  a sum  of  Stirling  numbers  doesn’t  fit  the  identities  of  Table  250  or  251, 
Table  258  may  be  just  the  ticket.  (An  example  appears  later  in  this  chapter, 
following  equation  (6.100).  Exercise  7.19  discusses  the  general  principles  of 
convolutions  based  on  identities  like  (6.52)  and  (6.53).) 


6.3  HARMONIC  NUMBERS 

It’s  time  now  to  take  a closer  look  at  harmonic  numbers,  which  we 
first  met  back  in  Chapter  2: 

11  1 n 1 

Hn  = 1 + - + - + ■••  + - = Yr,  integer  n ^ 0.  (6.54) 

23  n t\k 

These  numbers  appear  so  often  in  the  analysis  of  algorithms  that  computer 
scientists  need  a special  notation  for  them.  We  use  Hn,  the  ‘H’  standing  for 


6.3  HARMONIC  NUM  BERS  259 


“harmonic,”  since  a tone  of  wavelength  1/n  is  called  the  nth  harmonic  of  a 
tone  whose  wavelength  is  1.  The  first  few  values  look  like  this: 


This  must  be 
Table  259. 


n 

0 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Hn 

0 1 

3 

11 

25 

137 

49 

363 

761 

7129 

7381 

2 

6 

12 

60 

20 

140 

280 

2520 

2520 

Exercise  21  shows  that  Hn  is  never  an  integer  when  n > 1. 

Here’s  a card  trick,  based  on  an  idea  by  R.  T.  Sharp  [264],  that  illustrates 
how  the  harmonic  numbers  arise  naturally  in  simple  situations.  Given  n cards 
and  a table,  we’d  like  to  create  the  largest  possible  overhang  by  stacking  the 
cards  up  over  the  table’s  edge,  subject  to  the  laws  of  gravity: 


To  define  the  problem  a bit  more,  we  require  the  edges  of  the  cards  to  be 
parallel  to  the  edge  of  the  table;  otherwise  we  could  increase  the  overhang  by 
rotating  the  cards  so  that  their  corners  stick  out  a little  farther.  And  to  make 
the  answer  simpler,  we  assume  that  each  card  is  2 units  long. 

With  one  card,  we  get  maximum  overhang  when  its  center  of  gravity  is 
just  above  the  edge  of  the  table.  The  center  of  gravity  is  in  the  middle  of  the 
card,  so  we  can  create  half  a cardlength,  or  1 unit,  of  overhang. 

With  two  cards,  it’s  not  hard  to  convince  ourselves  that  we  get  maximum 
overhang  when  the  center  of  gravity  of  the  top  card  is  just  above  the  edge 
of  the  second  card,  and  the  center  of  gravity  of  both  cards  combined  is  just 
above  the  edge  of  the  table.  The  joint  center  of  gravity  of  two  cards  will  be 
in  the  middle  of  their  common  part,  so  we  are  able  to  achieve  an  additional 
half  unit  of  overhang. 

This  pattern  suggests  a general  method,  where  we  place  cards  so  that  the 
center  of  gravity  of  the  top  k cards  lies  just  above  the  edge  of  the  k+  1st  card 
(which  supports  those  top  kj.  The  table  plays  the  role  of  the  ri+  1st  card.  To 
express  this  condition  algebraically,  we  can  let  dk  be  the  distance  from  the 
extreme  edge  of  the  top  card  to  the  corresponding  edge  of  the  kth  card  from 
the  top.  Then  dj  = 0,  and  we  want  to  make  dk+i  the  center  of  gravity  of  the 
first  k cards: 


dk+i 


( di  + 1 ) + ( d2  + 1 ) + •*•  + ( djc  + 1 ) 
p i for  1 < k ^ n- 


(6-55) 
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(The  center  of  gravity  of  k objects,  having  respective  weights  wi , . . . , Wk 
and  having  respective  Centers  of  gravity  at  positions  pi,  ■ ■ ■ Pk,  is  at  position 
(W1P1  + •.  ■ + WicPic)/(wi  + u + Wk).)We  can  rewrite  this  recurrence  in  two 
equivalent  forms 


kdk+i  = k + di  + . . . + d_k  -i  + dk , k 0; 

(k  — 1 )dk  = k - 1 + di  + ■ • • + dk_i  , k ^ 1. 

Subtracting  these  equations  tells  us  that 

kdk+i  — (k  — 1 )dk  = 1 +dk,  k ^ I; 

hence  dk+i  = dk  + 1/k.  The  second  card  will  be  offset  half  a unit  past  the 
third,  which  is  a third  of  a unit  past  the  fourth,  and  so  on.  The  general 
formula 


dk+i  = Hk 


(6.56) 


follows  by  induction,  and  if  we  set  k = rt  we  get  dn+i  = Hn  as  the  total 
overhang  when  n cards  are  stacked  as  described. 

Could  we  achieve  greater  overhang  by  holding  back,  not  pushing  each 
card  to  an  extreme  position  but  storing  up  “potential  gravitational  energy” 
for  a later  advance?  No;  any  well-balanced  card  placement  has 

(1  + di ) + (1  + d2)  H b(l+dk)  , ^ 1.  ^ _ 

dk+i  6 ^ 1 ^ k ^ n. 


Furthermore  di  = 0.  It  follows  by  induction  that  dk+i  Sj  Hk- 

Notice  that  it  doesn’t  take  too  many  cards  for  the  top  one  to  be  com- 
pletely past  the  edge  of  the  table.  We  need  an  overhang  of  more  than  one 
cardlength,  which  is  2 units.  The  first  harmonic  number  to  exceed  2 is 
r4  = ff , so  we  need  only  four  cards. 

And  with  52  cards  we  have  an  H52-unit  overhang,  which  turns  out  to  be 
H52/2  « 2.27  cardlengths.  (We  will  soon  learn  a formula  that  tells  us  how  to 
compute  an  approximate  value  of  Hn  for  large  n without  adding  up  a whole 
bunch  of  ifactions.) 

An  amusing  problem  called  the  “worm  on  the  rubber  band”  shows  har- 
monic numbers  in  another  guise.  A slow  but  persistent  worm,  W,  starts  at 
one  end  of  a meter-long  rubber  band  and  crawls  one  centimeter  per  minute 
toward  the  other  end.  At  the  end  of  each  minute,  an  equally  persistent  keeper 
of  the  band,  K,  whose  sole  purpose  in  life  is  to  frustrate  W,  stretches  it  one 
meter.  Thus  after  one  minute  of  crawling,  W is  1 centimeter  from  the  start 
and  99  from  the  finish;  then  K stretches  it  one  meter.  During  the  stretching 
operation  W maintains  his  relative  position,  1%  from  the  start  and  99%  from 


Anyone  who  actu- 
ally tries  to  achieve 
this  maximum 
overhang  with  52 

cards  is  probably 
not  dealing  with 
a full  deck-or 
maybe  he's  a real 
joker. 
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Metric  units  make 
this  problem  more 
scientific. 


A flatworm,  eh? 


the  finish;  so  W is  now  2 cm  from  the  starting  point  and  f 98  cm  from  the 
goal.  After  W crawls  for  another  minute  the  score  is  3 cm  traveled  and  197 
to  go;  but  K stretches,  and  the  distances  become  4.5  and  295.5.  And  so  on. 
Does  the  worm  ever  reach  the  finish?  He  keeps  moving,  but  the  goal  seems  to 
move  away  even  faster.  (We’re  assuming  an  infinite  longevity  for  K and  W, 
an  infinite  elasticity  of  the  band,  and  an  infinitely  tiny  worm.) 

Let’s  write  down  some  formulas.  When  K stretches  the  rubber  band,  the 
fraction  of  it  that  W has  crawled  stays  the  same.  Thus  he  crawls  1/1 00th  of 
it  the  first  minute,  1 /200th  the  second,  1 /300th  the  third,  and  so  on.  After 
n minutes  the  fraction  of  the  band  that  he’s  crawled  is 

m(l  + l + l + '"  + n)  = m'  (657) 

So  he  reaches  the  finish  if  Hn  ever  surpasses  100. 

We’ll  see  how  to  estimate  Hn  for  large  n.  soon;  for  now,  let’s  simply 
check  our  analysis  by  considering  how  “Superworm”  would  perform  in  the 
same  situation.  Superworm,  unlike  W,  can  crawl  50cm  per  minute;  so  she 
will  crawl  Hn/2  of  the  band  length  after  n minutes,  according  to  the  argument 
we  just  gave.  If  our  reasoning  is  correct.  Superworm  should  finish  before  n 
reaches  4,  since  H4  > 2.  And  yes,  a simple  calculation  shows  that  Superworm 
has  only  33 1 cm  left  to  travel  after  three  minutes  have  elapsed.  She  finishes 
in  3 minutes  and  40  seconds  flat. 

Harmonic  numbers  appear  also  in  Stirling’s  triangle.  Let’s  try  to  find  a 
closed  form  for  [2]  , the  number  of  permutations  of  n objects  that  have  exactly 
two  cycles.  Recurrence  (6.8)  tells  us  that 


n + 1 
2 


if  n > 0; 


and  this  recurrence  is  a natural  candidate  for  the  summation  factor  technique 
of  Chapter  2: 


n! 


in  2~  1 1 


( n - 1 ) ! 


n 

2] 


Unfolding  this  recurrence  tells  us  that  ^ n^1  = Hn;  hence 


= n!Hn 


(6-58) 


We  proved  in  Chapter  2 that  the  harmonic  series  1 /k  diverges,  which 
means  that  Hn  gets  arbitrarily  large  as  n — > 00.  But  our  proof  was  indirect; 
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we  found  that  a certain  infinite  sum  (2.58)  gave  different  answers  when  it  was 
rearranged,  hence  Y k Mk  could  not  be  bounded.  The  fact  that  Hn  — > oo 
seems  counter-intuitive,  because  it  implies  among  other  things  that  a large 
enough  stack  of  cards  will  overhang  a table  by  a mile  or  more,  and  that  the 
worm  W will  eventually  reach  the  end  of  his  rope.  Let  us  therefore  take  a 
closer  look  at  the  size  of  Hn  when  n is  large. 

The  simplest  way  to  see  that  Hn  — t oo  is  probably  to  group  its  terms 
according  to  powers  of  2.  We  put  one  term  into  group  1,  two  terms  into 
group  2,  four  into  group  3,  eight  into  group  4,  and  so  on: 


1 


group  1 


llillll  1 1 1 1 1 

4~'  5+'  6~'  7 ' 8+  9+ir+lTJ  +1'2-  +t3-  +Ff  +LS  + 


group  3 


group  4 


Both  terms  in  group  2 are  between  | and  1 ] so  the  sum  of  that  group  is 
between  2 ■ \ \ and  2 - \ = 1 . All  four  terms  in  group  3 are  between  | 

and  1,  so  their  sum  is  also  between  1 and  1.  In  fact,  each  of  the  2k_1  terms 

in  group  k is  between  2~k  and  21-k;  hence  the  sum  of  each  individual  group 
is  between  j and  1. 

This  grouping  procedure  tells  us  that  if  n is  in  group  k,  we  must  have 

Hn  > k/2  and  Hn  5$  k (by  induction  on  k).  Thus  Hn  — > oo,  and  in  fact 

LlgVJ  < Hn  ^ LlgnJ+1  (6.59) 

We  should  call  them 
the  worm  numbers, 
they're  so  slow. 


We  now  know  Hn  within  a factor  of  2.  Although  the  harmonic  numbers 
approach  infinity,  they  approach  it  only  logarithmically-that  is,  quite  slowly. 

Better  bounds  can  be  found  with  just  a little  more  work  and  a dose 
of  calculus.  We  learned  in  Chapter  2 that  Hn  is  the  discrete  analog  of  the 
continuous  function  Inn.  The  natural  logarithm  is  defined  as  the  area  under 
a curve,  so  a geometric  comparison  is  suggested: 


The  area  under  the  curve  between  1 and  TV,  which  is  dx/x  = Inn,  is  less 
than  the  area  of  the  n rectangles,  which  is  Y k=  i ^ = H,.  Thus  Inn  < Hn; 
this  is  a sharper  result  than  we  had  in  (6.59).  And  by  placing  the  rectangles 
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“/  now  see  a way 
too  how  ye  aggre- 
gate of  ye  termes 
of  Musicall  pro- 
gressions may  bee 
found  (much  after 
ye  same  manner) 
by  Logarithms,  but 
ye  calculations  for 
finding  out  those 
rules  would  bee  still 
more  troublesom.” 
—I,  Newton  [223} 


a little  differently,  we  get  a similar  upper  bound: 


This  time  the  area  of  the  n rectangles,  Hn,  is  less  than  the  area  of  the  first 
rectangle  plus  the  area  under  the  curve.  We  have  proved  that 


In  n < Hn  < 1 n n + 1 , for  n > 1 . 


(6.60) 


We  now  know  the  value  of  Hn  with  an  error  of  at  most  1. 

“Second  order”  harmonic  numbers  H^1  arise  when  we  sum  the  squares 
of  the  reciprocals,  instead  of  summing  simply  the  reciprocals: 


H'2>  = 1 + 1 + 1 + -- ■+  1 


rU 


Similarly,  we  define  harmonic  numbers  of  older  r by  summing  (— r)th  powas: 


k=l 


_1_ 

ic 


(6.61) 


If  r > 1,  these  numbers  approach  a limit  as  n — > oo;  we  noted  in  Chapter  4 
that  this  limit  is  conventionally  called  Riemann’s  zeta  function: 

C(r)=  H£=  (6.62) 

k£l 


Euler  discovered  a neat  way  to  use  generalized  harmonic  numbers  to 
approximate  the  ordinary  ones,  !.  Let’s  consider  the  infinite  series 

, ( k \ 1 1 1 1 

Vic^T ) ~ k + Ii?  + 3i?  + 4i?  + '"’  (d63) 

which  converges  when  k > 1.  The  left-hand  side  is  In  k — ln(k  — 1);  therefore 
if  we  sum  both  sides  for  2 ^ k ^ n the  left-hand  sum  telescopes  and  we  get 


In  n — In  1 


_ An  j_  j_  j_ 

- 1 k + 2k2  + 3k3  + 4k4  + ' ' ' 


k=2 


= (H,-l)  + j (H^2 >— l)  + 1(H^-1)+  1(h14,-1)  + 


(3) 


j(4)_ 
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Rearranging,  we  have  an  expression  for  the  difference  between  Hn  and  In  n: 
Hn  - Inn  = 1-  2(Hlv2Ll)~l(H'31-l)-KHn4)-l)--  ■ ■ 

When  n — > oo,  the  right-hand  side  approaches  the  limiting  value 
1 - 1(C(2)-1)  - 1(C(3)  -1)  - 1(C(4)-1) , 


which  is  now  known  as  Euler's  constant  and  conventionally  denoted  by  the 
Greek  letter  y.  In  fact,  L(r)  — 1 is  approximately  l/2r,  so  this  infinite  series 
converges  rather  rapidly  and  we  can  compute  the  decimal  value 

y = 0.5772156649.  . . . (6.64) 

Euler’s  argument  establishes  the  limiting  relation 


“Huius  igitur  quan- 
titatis  constantis 
C valorem  detex- 
imus,  quippe  est 
C = 0,577218.” 

— L.  Euler  [83] 


lim  (H,  —Inn)  = y;  (6,65) 

n— >00  1 

thus  Hn  lies  about  58%  of  the  way  between  the  two  extremes  in  (6.60).  We 
are  gradually  homing  in  on  its  value. 

Further  refinements  are  possible,  as  we  will  see  in  Chapter  9.  We  will 
prove,  for  example,  that 

H"  = lnn  + Y + il2SP+iW'  0<e”<'1'  "■“> 

This  formula  allows  us  to  conclude  that  the  millionth  harmonic  number  is 


Hi 000000  ~ 14.3927267228657236313811275, 

without  adding  up  a million  fractions.  Among  other  things,  this  implies  that 
a stack  of  a million  cards  can  overhang  the  edge  of  a table  by  more  than  seven 
cardlengths. 

What  does  (6.66)  tell  us  about  the  worm  on  the  rubber  band?  Since  Hn  is 
unbounded,  the  worm  will  definitely  leach  the  end,  when  Hn  first  exceeds  100. 

Our  approximation  to  Hn  says  that  this  will  happen  when  n is  approximately 

Well,  they  can  ‘t 
really  go  at  it  this 
long;  the  world  will 
have  ended  much 
earlier,  when  the 
Tower  of  Brahma  is 
fully  transferred. 


„100-y  ^ 99.423 

fc,  \C 

In  fact,  exercise  9.49  proves  that  the  critical  value  of  n is  either  [e100~YJor 
[e100  Y] . We  can  imagine  W’s  triumph  when  he  crosses  the  finish  line  at  last, 
much  to  K’s  chagrin,  some  287  decillion  centuries  after  his  long  crawl  began. 
(The  rubber  band  will  have  stretched  to  more  than  1 027  light  years  long;  its 
molecules  will  be  pretty  far  apart.) 
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6.4  HARMONIC  SUMMATION 

Now  let’s  look  at  some  sums  involving  harmonic  numbers,  starting 
with  a review  of  a few  ideas  we  learned  in  Chapter  2.  We  proved  in  (2.36) 
and  (2.5 7)  that 


X Hk 

0$k<n 

X kHk 

0$k<n 


uHn  - n ; 
n(n  — 1) 


n(n-  1) 

4 


(6.67) 

(6.68) 


Let’s  be  bold  and  take  on  a more  general  sum,  which  includes  both  of  these 
as  special  cases:  What  is  the  value  of 


when  m is  a nonnegative  integer? 

The  approach  that  worked  best  for  (6.67)  and  (6.68)  in  Chapter  2 was 
called  summation  by  parts.  We  wrote  the  summand  in  the  form  u(k)Av(k), 
and  we  applied  the  general  identity 

Xqu(x)Av(x)  6x  = u(x)v(x)|k  - Xa^x  + HAu(x)  6x.  (6.69) 

Remember?  The  sum  that  faces  us  now,  X!o<k<n  (m)^kl 's  a natural  for  this 
method  because  we  can  let 


u(k)  = Hk, 


Au(k)  = Hk+i  - Hk  = 


Av(k) 


- ( 


M)-( 

Vm+l ) 


1 

kTT 
\ 


k 

U+lJ 


(In  other  words,  harmonic  numbers  have  a simple  A and  binomial  coefficients 
have  a simple  A-‘,  so  we’re  in  business.)  Plugging  into  (6.69)  yields 


The  remaining  sum  is  easy,  since  we  can  absorb  the  (k  + 1 ) 1 using  our  old 
standby,  equation  (5.5): 


n i— . 

m + 1/  m + 1 
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Thus  we  have  the  answer  we  seek: 


(6.70) 


(This  checks  nicely  with  (6.67)  and  (6.68)  when  m = 0 and  m = 1.) 

The  next  example  sum  uses  division  instead  of  multiplication:  Let  us  try 
to  evaluate 


S 


n — 


n 


L 


Hk 

k 


If  we  expand  by  its  definition,  we  obtain  a double  sum. 


S 


n 


L 


1 

rv 


Now  another  method  from  Chapter  2 comes  to  our  aid;  equation  (2.33)  tells 
us  that 


It  turns  out  that  we  could  also  have  obtained  this  answer  in  another  way  if 
we  had  tried  to  sum  by  parts  (see  exercise  26). 

Now  let’s  try  our  hands  at  a more  difficult  problem  [291],  which  doesn’t 
submit  to  summation  by  parts: 

Un  = Y (n~k)n'  integer  n 1 

(This  sum  doesn’t  explicitly  mention  harmonic  numbers  either;  but  who 
knows  when  they  might  turn  up?) 

We  will  solve  this  problem  in  two  ways,  one  by  grinding  out  the  answer 
and  the  other  by  being  clever  and/or  lucky.  First,  the  grinder’s  approach.  We 
expand  (n  — k)n  by  the  binomial  theorem,  so  that  the  troublesome  k in  the 
denominator  will  combine  with  the  numerator: 

u*  = l ; {-=Y  i (?)  (-k,innH 

k^1  u j 

j ' > ' k^1  N / 

This  isn’t  quite  the  mess  it  seems,  because  the  k)*1  in  the  inner  sum  is  a 
polynomial  in  k,  and  identity  (5.40)  tells  us  that  we  are  simply  taking  the 


(Not  to  give  the 
answer  away  or 
anything.) 
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nth  difference  of  this  polynomial.  Almost;  first  we  must  clean  up  a few  things. 
For  one,  lei-1  isn’t  a polynomial  if  j = 0;  so  we  will  need  to  split  off  that  term 
and  handle  it  separately.  For  another,  we’re  missing  the  term  k = 0 from  the 
formula  for  nth  difference;  that  term  is  nonzero  when  j = 1 , so  we  had  better 
restore  it  (and  subtract  it  out  again).  The  result  is 


OK,  now  the  top  line  (the  only  remaining  double  sum)  is  zero:  It’s  the  sum 
of  multiples  of  nth  differences  of  polynomials  of  degree  less  than  n,  and  such 
nth  differences  are  zero.  The  second  line  is  zero  except  when  j = 1,  when  it 

equals  — rtn.  So  the  third  line  is  the  only  residual  difficulty;  we  have  reduced 

the  original  problem  to  a much  simpler  sum: 

Un  = Ttn(Tn  - 1 ) , where  Tn  = ^ — (6.72) 

For  example,  U3  = (?)f  © \=  fi  Tj  = (?)  { (2)  \ + (3) 5 = T>  hence 

U3  = 27 ( T3  — 1)  as  claimed. 

How  can  we  evaluate  Tn?  One  way  is  to  replace  (JJ)  by  (n^1)  + (£”]), 
obtaining  a simple  recurrence  for  Tn  in  terms  of  Tn  ; . But  there’s  a more 
instructive  way:  We  had  a similar  formula  in  (5.41),  namely 

y M(-1)k  _ n! 

l k/  x + k x ( x + I ) . . . ( x + n)  ’ 

k x ' 

If  we  subtract  out  the  term  for  k = 0 and  set  x = 0,  we  get  — Tn.  So  let’s  do  it: 


Tn 


n: 


x x ( x + 1 ) . . . (x  + n' 

(x+  1) . . . (x  + n)  - n! 
x(x  T 1 ) . . , 


x=C 


■n 


+ x[n+']  + rnT’l  -n  ! 


x=C 

rr 


x(x  + 1 ) . . . (x  + n) 


1 

u + r 

n n- 

x=0 

2 
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(We  have  used  the  expansion  (6.11)  of  (x  + 1)  . . . (x  + n)  = xn+1/x;  we  can 
divide  x out  of  the  numerator  because  [n  j 1 j = n!.)  But  we  know  from  (6.58) 
that  [n+ ']  = n!  Hn;  hence  Tn  = Hn,  and  we  have  the  answer: 

Un  = nn(Hn  — 1 ) . (6.73) 

That’s  one  approach.  The  other  approach  will  be  to  try  to  evaluate  a 
much  more  general  sum, 

Un(x,y)  = — (x  + ky)u,  integer  n ^ 0;  (6.74) 

the  value  of  the  original  Un  will  drop  out  as  the  special  case  Un(n,  -1).  (We 
are  encouraged  to  try  for  more  generality  because  the  previous  derivation 
“threw  away”  most  of  the  details  of  the  given  problem;  somehow  those  details 
must  be  irrelevant,  because  the  nth  difference  wiped  them  away.) 

We  could  replay  the  previous  derivation  with  small  changes  and  discover 
the  value  of  Un(x,y).  Or  we  could  replace  (x  + ky)n  by  (x  + Icy ) n 1 (x  + ky) 
and  then  replace  (£)  by  (nk1)  + ( k ] ) , leading  to  the  recurrence 

Un(x,y)  = xUn*4(x,y)  +xn/n  + yxn~1  ; (6.75) 

this  can  readily  be  solved  with  a summation  factor  (exercise  5). 

But  it’s  easiest  to  use  another  trick  that  worked  to  our  advantage  in 
Chapter  2:  differentiation.  The  derivative  of  Un  (x,  y ) with  respect  to  y brings 
out  a k that  cancels  with  the  k in  the  denominator,  and  the  resulting  sum  is 
trivial: 

^un(x,  y)  = Y_  Q)  + ky)n_1 

= ^ ™n-]  Q)(-1)kn(x  + ky)n_1  = nxn-' . 

(Once  again,  the  nth  difference  of  a polynomial  of  degree  <tl  has  vanished.) 

We’ve  proved  that  the  derivative  of  Un(x,  y)  with  respect  to  y is  nxrl  1 , 
independent  of  y.  In  general,  if  f’(y)  = c then  f(y)  = f(0)  + cy;  therefore  we 
must  have  Un(x,y)  = Un(x,0)  + nxn1y. 

The  remaining  task  is  to  determine  Un  (x,  0).  But  Un(x,  0)  is  just  xn 
times  the  sum  Tn  = Hn  we’ve  already  considered  in  (6.72);  therefore  the 
general  sum  in  (6.74)  has  the  closed  form 

Un(x,  y)  = xnHn  + nxa_1  y . (6.76) 


In  particular,  the  solution  to  the  original  problem  is  Un  (n,  -1)  = nn(Hn  1). 
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The  next  important  sequence  of  numbers  on  our  agenda  is  named 
after  Jakob  Bernoulli  (1654-1705),  who  discovered  curious  relationships  while 
working  out  the  formulas  for  sums  of  mth  powers  [22].  Let’s  write 

n-l  n 

S,(n)  = 0m  + lm  H 1-  (n  — 1 )m  = km  = Y_  *m6x'  (6-77) 

k=0 


(Thus,  when  m > 0 we  have  S,(n)  = m the  notation  of  generalized 

harmonic  numbers.)  Bernoulli  looked  at  the  following  sequence  of  formulas 
and  spotted  a pattern: 


So(n) 

= 

= n 

Si(n) 

= 

W 

- 

*n 

S2(n) 

= 

jU3 

- 

+ in 

S3(n) 

= 

In3 

+ X 

S4(n) 

— 

K 

- 

X 

+ X - 

30  n 

S5(n) 

= 

X 

- 

+ iX 

iX 

S6(n) 

= 

K 

- 

In6 

+ in5  - 

X 

+ 

hn 

S7(n) 

= 

ln* 
8 11 

- 

ln7 
2 L 

+ — n6  - 
' 12  1 

24  Tl 

+ 

±n2 
12  1 

Ss(n) 

— 

^n9 
9 1 

- 

In8 

+ 2n7_ 
3U 

-Z-n.5 
15  1 

+ 

X~ 

Jon 

S9(n)  = 

= 

_Ln10 

10U 

X 

+ 

00 

1 

2_n6 
10  1 

+ 

X- 

i-n2 
20  ,L 

Sio(n) 

= 

±nu 
ii  u 

_ 

in10+  |rt9  - 

- n7 

+ 

n5- 

X + X 

Can  you  see  it  too?  The  coefficient  of  nm+1  in  S,(n)  is  always  1 /(m  + 1). 
The  coefficient  of  nm  is  always  —1/2.  The  coefficient  of  nm  1 is  always  . . . 
let’s  see  . . . m/12.  The  coefficient  of  nm  2 is  always  zero.  The  coefficient 
of  nm~3  is  always  . . . let’s  see  . . . hmmm  . . . yes,  it’s  — m(m— 1 )(m— 2)/720. 
The  coefficient  of  nm  4 is  always  zero.  And  it  looks  as  if  the  pattern  will 
continue,  with  the  coefficient  of  nm~k  always  being  some  constant  times  m-. 

That  was  Bernoulli’s  discovery.  In  modern  notation  we  write  the  coeffi- 
cients in  the  form 


(6.78) 
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Bernoulli  numbers  are  defined  by  an  implicit  recurrence  relation, 

TTl  / -I  v 

( m . JBj  = [ttl  ==  0]  , for  all  m ^ 0.  (6.79) 

j=o  ' 1 ' 

For  example,  (g)Bo  + (^)Bi  = 0.  The  first  few  values  turn  out  to  be 


n 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Bn 

1 

1 

2 

1 

6 

0 

-1 

30 

0 

1 

42 

0 

-1 

30 

0 

5 

66 

0 

-691 

2730 

(All  conjectures  about  a simple  closed  form  for  Bn  are  wiped  out  by  the 
appearance  of  the  strange  fraction  —691/2730.) 

We  can  prove  Bernoulli’s  formula  (6.78)  by  induction  on  m,  using  the 
perturbation  method  (one  of  the  ways  we  found  S2  (n. ) = Dn  in  Chapter  2): 


Sm+1(n)+  nm+1  = £ (k  + 1 
k=0 

n-'t  m+1 

= EE 

k=0  j=0 


.m+1 


m + 1 


m+1 

* = L 

j=0 


m + 1 

j 


S;  (nl . 


(6.80) 


Let  Sm(n)  be  the  right-hand  side  of  (6.78);  we  wish  to  show  that  S,,,(n)  = 
Sm(ti),  assuming  that  Sj  (n)  = Sj  (n)  for  0 <;  j < nr.  We  begin  as  we  did  for 
m = 2 in  Chapter  2,  subtracting  Sm+i  (n)  from  both  sides  of  (6.80).  Then  we 
expand  each  Sj  (n)  using  (6.78),  and  regroup  so  that  the  coefficients  of  powers 
of  n on  the  right-hand  side  are  brought  together  and  simplified: 


rt 


m+1 


= E 

j=o 

m 

■ E 

i=o 


tti+1'\  . . /m+1 

Sj(rt)  = 2_  I ■ )Si(n)  + 


m+1 

m 


m + 1 \ 1 J-  (\  + 1 


i / 1 + 1 


E 


Y_  "j"  ^ BkTti+1  k + (m  + 1 ) A 

k=0  ' ' 


m+lVi  + lVBAni+i-k  + (m+1)A 


k / j + 1 


= L 


E 

O^k^j^m 

n 


m+  A f)  + A Bj-k 


j — k)  j + in  + (m+  1)  A 


k+l 


1 / M - V ) 

m+A/j  + ABj-k^) 

i y \k  + vj  + i 


nk+  + (m  + 1 ) A 


k+1 


e 777  e rnCK^+DA 


k+1 

Os^k^m  k:^  j<5  m 
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Here's  some  more 
neat  stuff  that 
you'll  probably 
want  to  skim 
through  the  first 
time. 

-Friend/y  TA 


Start 

Skimming 


= L 


uk+1  /m+1\  r-  /m+1-k  A 

k+1  ( ic  ) L ( j_k  ) Bj~k  + (m  + 1 ) A 


= L 


n 


k+l 


0$k$m 


k+1 


m + 1 
k 


L (ra+. 


-k' 


Bj  + (ttv  + 1 ) A 


L 

O^k^m 


TT 


k+1  /m  + 1 


k+l 


[m  - k = 0]  + (m  + 1 ) A 


n 


m+1 


m + 1 


m + 1 V m 

, m+l 


+ (m+  1)  A 


= n + (m+  1)  A,  where  A = S„,(n)  — Sm(n) 


(This  derivation  is  a good  review  of  the  standard  manipulations  we  learned 
in  Chapter  5.)  Thus  A = 0 and  S„,(n)  = Sm(n),  QED. 

In  Chapter  7 we’ll  use  generating  functions  to  obtain  a much  simpler 
proof  of  (6.78).  The  key  idea  will  be  to  show  that  the  Bernoulli  numbers  are 
the  coefficients  of  the  power  series 


(6.81) 


Let’s  simply  assume  for  now  that  equation  (6.81)  holds,  so  that  we  can  de- 
rive some  of  its  amazing  consequences.  If  we  add  jz  to  both  sides,  thereby 
cancelling  the  term  Biz/1!=  — jz  from  the  right,  we  get 


z ^ z _ z ez  + 1 z ez/2  +_e  1/2  _ z z 

ez  — 1 — " 2 ez  — 1 2 W2  ~ ~ 2 COttl  2’ 


(6.82) 


Here  coth  is  the  “hyperbolic  cotangent’’  function,  otherwise  known  in  calculus 
books  as  cosh  z/sinh  z\  we  have 


sinhz 


coshz  = 


ez  + e 


(6.83) 


2 , -2 

Changing  z to  — z gives  (^)  coth(  ~y)-\  coth  hence  every  odd-numbered 
coefficient  of  § coth  | must  be  zero,  and  we  have 

B3  = B5  = B7  = B9  = B, 


H3 


0. 


t6.84) 


Furthermore  (6.82)  leads  to  a closed  form  for  the  coefficients  of  coth: 


zcothz 


2z 


e2z  - 1 


+ | = Lb 


2n ' 


(2z) 


(2rt) ! 


Tf  = L4"B2n 


n^O 


y2n 


(2n)! 


. (6.85) 


But  there  isn’t  much  of  a market  for  hyperbolic  functions;  people  are  more 
interested  in  the  “real”  functions  of  trigonometry.  We  can  express  ordinary 
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trigonometric  functions  in  terms  of  their  hyperbolic  cousins  by  using  the  rules 


sin  z = -isinh  iz  , cos  z = cosh  iz; 
the  corresponding  power  series  are 

sinz  = 


(6.86) 


1! 


3!  + 5! 


z°  z2  z4 

COS  Z = 1 

0!  2!  4! 


z1  z3  z5 

sinhz  = — + — H f ■ 

3.1  3.!  5.! 

z°  z2  z4 

coshz  = 0!  + 2!  +4!  +‘ 


Hence  cot  z = cos  z/sin  z = i cosh  iz/  sinh  iz  = i COth  iz,  and  we  have 


zcotz  = ^ B 2n- 


(2iz) 


2n 


(2n)! 


^(-4)nB2n 


r2n 


(2tl)  ! 


(6.87) 


n^O  n^O 

Another  remarkable  formula  for  zcot  z was  found  by  Euler  (exercise  73): 

,2 


zcotz  = 1 — 2 Y" 


k>1 


k27T2-Z2  ' 


(6.88) 


We  can  expand  Euler’s  formula  in  powers  of  z2,  obtaining 

,6 


zcot  z 


1 


2M 


( z2  tr 


+ 


+ • 


k27t2  Vf’TX6 


+ 


= 1 -2 


7f 


h(2)  + — hC4)  + — h,6)  + ---  ) 

OO  1 1 lOC  ' 1 lOO  ' I ' 


,2  CXD 


7X4 


7t 


Equating  coefficients  of  z2n  with  those  in  our  other  formula,  (6.87),  gives  us 
an  almost  miraculous  closed  form  for  infinitely  many  infinite  sums: 


C(2n)  = 

For  example, 


H(2n)  = (_!  )n-i  , integer  n > 0. 

(2n)! 


C(2)  = H£>  = 1 + J + $+■■■  = tt2B2  = tt2/6; 

C(4)  = = 1 + ^ + £ +•  • • = -n4  B4/3  = 71790. 


Formula  (6.89)  is  not  only  a closed  form  for  H7n\  it  also  tells  us  the  approx- 
imate size  of  B2n,  since  Hoc1'  is  very  near  1 when  n is  large.  And  it  tells 
US  that  (— l)11”1  B2n  > 0 for  all  n > 0;  thus  the  nonzero  Bernoulli  numbers 
alternate  in  sign. 


I see,  we  get  “real” 

functions  by  using 
imaginary  numbers. 


(6.89) 


(6-90) 

(6-9i) 
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Start 

Skipping 


And  that’s  not  all.  Bernoulli  numbers  also  appear  in  the  coefficients  of 
the  tangent  function, 

tanz  = ^ = XH)n~,4n(4n-1)B2n^,  (6.92) 

c°Sz  ^ (2n) 

as  well  as  other  trigonometric  functions  (exercise  70).  Formula  (6.92)  leads 
to  another  important  fact  about  the  Bernoulli  numbers,  namely  that 

_ ,4n(4n  - 1) 

T2n_i  = (-1)n  ^ B2n  is  a positive  integer.  (6.93) 

We  have,  for  example: 


n 

1 3 5 

7 

9 

11 

13 

Tn 

1 2 16 

272 

793  6 

353792 

22368256 

(The  T ' s are  called  tangent  numbers.) 

One  way  to  prove  (6.93),  following  an  idea  of  B.  F.  Logan,  is  to  consider 
the  power  series 


sin  z + x cos  z 
cosz-xsinz 


,x+  ( 1 Tx2 )z T (2 x3+2x)-  + (6x4+8x2+2) 


= 5”TnM~T’  (6.94) 

fe-o  n! 


When  x = tanw, 
this  is  tan(z  + w) . 


where  T,  , ( x ) is  a polynomial  in  x;  setting  x = 0 gives  Tn  ( 0)  = Tn,  the  nth 
tangent  number.  If  we  differentiate  (6.94)  with  respect  to  x,  we  get 


cosz  — xsinz 


n^O 


but  if  we  differentiate  with  respect  to  z,  we  get 


1 T x2 


,n-l 


cos  z — x sin  z 


= LT"MtaVTH  = LWx) 


n^v  n^O 

(Try  it-the  cancellation  is  very  pretty.)  Therefore  we  have 

Tn+i  (x)  = (1  +x2)T^(x),  T0(x)  = x, 


z^ 

n!  ' 


(6-95) 


a simple  recurrence  from  which  it  follows  that  the  coefficients  of  Tn(x)  ar  e 
nonnegative  integers.  Moreover,  we  can  easily  prove  that  Tn(x)  has  degree 
n + 1,  and  that  its  coefficients  are  alternately  zero  and  positive.  Therefore 
T2n+i  (0)  = T2n+i  is  a positive  integer,  as  claimed  in  (6.93). 
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Recurrence  (6.95)  gives  us  a simple  way  to  calculate  Bernoulli  numbers, 
via  tangent  numbers,  using  only  simple  operations  on  integers;  by  contrast, 
the  defining  recurrence  (6.79)  involves  difficult  arithmetic  with  fractions. 

If  we  want  to  compute  the  sum  of  nth  powers  from  a to  b — 1 instead  of 
from  0 to  n — 1,  the  theory  of  Chapter  2 tells  us  that 

b-i  b 

km  = Y.  *m6x  = s’(b)  -s”’(a)-  (696) 

k=a 

This  identity  has  interesting  consequences  when  we  consider  negative  values 
of  k:  We  have 

-1  n-1 

km  — (“  1)m^^  , when  m > 0, 

k=-n+l  k=:0 

hence 


Sm(0)  - Sm(— n + l)  =:  (— 1 )m(Sm(n)  -S.(O)). 

But  Sm(0)  = 0,  so  we  have  the  identity 

Sm(l  -n)  = (— 1 )m+1  Sm(n)  , m > 0.  (6.97) 

Therefore  Sm(  1)  = 0.  If  we  write  the  polynomial  S,(n)  in  factored  form,  it 
will  always  have  the  factors  n and  (n-  1 ),  because  it  has  the  roots  0 and  1 , In 
general,  S,(n)  is  a polynomial  of  degree  m -f  1 with  leading  term  7777  nm+1  . 
Moreover,  we  can  set  n = j in  (6.97)  to  get  Sm ( j ) = ( — 1 )TrL+1  Sm  ( 3 ) ; if  m is 
even,  this  makes  Sm(|)  = 0,  so  (n  1)  will  be  an  additional  factor.  These 
observations  explain  why  we  found  the  simple  factorization 

S2(n)  = jn(n  - l)(n  - 1) 

in  Chapter  2;  we  could  have  used  such  reasoning  to  deduce  the  value  of  S2(n) 
without  calculating  it!  Furthermore,  (6.97)  implies  that  the  polynomial  with 
the  remaining  factors,  S,(n)  = Sm(n)/(n  j),  always  satisfies 

Sm(1  -n)  = S,(n),  m even,  m > 0. 


It  follows  that  S,(n)  can  always  be  written  in  the  factored  form 

fm/2] 


1 


Sm(n)  = { 


m+  1 


n- 


m + 1 


(n-l-otk)(n  2+ak),  m odd; 
k=l 

1)  m/2 

tn  even. 


(6.98) 


k==1 
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Stop 

simple^ 


Here  cxi  = and  0C2,  ■ ■ ■ , K[m/2l  are  appropriate  complex  numbers  whose 
values  depend  on  m.  For  example, 

S3(n)  = n2(n  — 1)2/4; 

S4(n)  = n(n-I)(n-l)(n- \ + JTjYl  l(n  - 1 - JW.  )/5 ; 

S5(n)  = n2(n-  1)2(n- \ + </3/4)(n  - \ - yflfi  )/6; 

S6(rt)  = n(n-l)(n-l)(n-i  + a)(n-±  - a)(n.-T  + a)(n-i  - a) , 

where  a - 2 5/2  3“ 1 /2  31 1/4  ( V \/3T  + \fTf  + i \f  i/JT  - \/27 ) . 

If  m is  odd  and  greater  than  1,  we  have  Bm  = 0;  hence  S,„(n)  is  divisible 
by  n2  (and  by  (n  — l)2).  Otherwise  the  roots  of  S,(n)  don’t  seem  to  obey  a 

law. 

Let’s  conclude  our  study  of  Bernoulli  numbers  by  looking  at  how  they 
relate  to  Stirling  numbers.  One  way  to  compute  S,(n)  is  to  change  ordinary 
powers  to  falling  powers,  since  the  falling  powers  have  easy  sums.  After  doing 
those  easy  sums  we  can  convert  back  to  ordinary  powers: 


n—  1 


Sm(n] 


k=C 


k=0  j^O 


= Itm  = LI  T l!  = I T Ik! 


izc 


n- 1 


1c=0 


gam 


n 


j^O  ' ' ' ' 1 k^O 

Therefore,  equating  coefficients  with  those  in  (6.78),  we  must  have  the  identity 

(-1)>  + 1-k  1 /m+r 


L 

j^O 


ftnl 

j + 1 

lit 

k 

be  nice  to 

in  a 

new  \ 

j + 1 


m + 1 


B 


m+1  — k • 


(6-99) 


us  any  obvious  handle  on  a proof  by  induction  that  the  left-hand  sum  in 
(6.99)  is  a constant  times  rn£— L If  k = m -j-  1,  the  left-hand  sum  is  just 
{m}  [m+l]  /( ) = 1 /(m+l  ),  so  that  case  is  easy.  And  if  k = m,  the  left- 
hand  side  sums  to  { mm, } [™]  m (m+TIf ' = l(m-l)-lm  = -i; 

so  that  case  is  pretty  easy  too.  But  if  k < m,  the  left-hand  sum  looks  hairy. 
Bernoulli  would  probably  not  have  discovered  his  numbers  if  he  had  taken 
this  route. 
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Gnethingwecandoisreplace  {™}  by  {1^1}  — (j  + 1 The  (j  + 1) 

nicely  cancels  with  the  awkward  denominator,  and  the  left-hand  side  becomes 


L 


fm+1 

U + l 


'j  + r 

(-ni+1-k  r m 1 

‘i  + T 

k 

> + 1 ~ 

k 

(— 1 )j+1~k 


The  second  sum  is  zero,  when  k < m,  by  (6.31).  That  leaves  us  with  the  first 
sum,  which  cries  out  for  a change  in  notation;  let’s  rename  all  variables  so 
that  the  index  of  summation  is  k,  and  so  that  the  other  parameters  are  m 
and  n.  Then  identity  (6.99)  is  equivalent  to 


M 

'k' 

\kj 

m 

(-1 


\ tc— m 


n 


+ [m  = n — 1] . 


(6.100) 


Good,  we  have  something  that  looks  more  pleasant-although  Table  251  still 
doesn’t  suggest  any  obvious  next  step. 

The  convolution  formulas  in  Table  258  now  come  to  the  rescue.  We  can 
use  (6.51)  and  (650)  to  rewrite  (he  summand  in  terms  of  Stirling  polynomials: 


M 

k' 

Uj 

'k' 

(-i)k-m 

IkJ 

m 

k 

= (-1 


= (-1 


\n-k+l 


ni 


(k-Vj 


7^n-k(— k)  • 


k! 


m-11! 


Oic— m(k)  i 


\n+l  -m 


n! 


0n-k(  k)  C|c  m(k)  ■ 


(m— 1 )! 

Things  are  looking  good;  the  convolution  in  (6.48)  yields 


u n-m 

} o'n-kl  k)  CTk_m(k)  J crn  -m-k(  tr  T [n-m-k))  (Tk(m  + k) 
k=0  k=0 

m-n.  / , 

= 7— 17 n+(n-m). 

(m)(— tv) 

Fonnula  (6.100)  is  now  verified,  and  we  find  that  Bernoulli  numbers  are  related 
to  the  constant  terms  in  the  Stirling  polynomials: 

(-1  )m”1mam(0)  = + [m  = 1] . (6.101) 

m! 


Stop 

Skimming 


6.6  FIBONACCI  NUMBERS 

Now  we  come  to  a special  sequence  of  numbers  that  is  perhaps  the 
most  pleasant  of  all,  the  Fibonacci  sequence  ( F , ) : 


n 

0 0 1 1 2 1 3 2 1 3 5 5 II  111  111 

11  ) II 11  II 11  HI  12  22!  13  III  11 

Fn 
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The  back-to-nature 
nature  of  this  ex- 
ample is  shocking. 
This  book  should  be 
banned. 


Phyllotaxis,  n. 
The  love  of  taxis. 


Unlike  the  harmonic  numbers  and  the  Bernoulli  numbers,  the  Fibonacci  num- 
bers are  nice  simple  integers.  They  are  defined  by  the  recurrence 

F0  = 0; 

Fi  = i; 

Fn  = Fn_i  + Fn-2 , forn  >1.  (6.102) 

The  simplicity  of  this  rule-the  simplest  possible  recurrence  in  which  each 
number  depends  on  the  previous  two-accounts  for  the  fact  that  Fibonacci 
numbers  occur  in  a wide  variety  of  situations. 

“Bee  trees”  provide  a good  example  of  how  Fibonacci  numbers  can  arise 
naturally.  Let’s  consider  the  pedigree  of  a male  bee.  Each  male  (also  known 
as  a drone)  is  produced  asexually  from  a female  (also  known  as  a queen);  each 
female,  however,  has  two  parents,  a male  and  a female.  Here  are  the  first  few 
levels  of  the  tree: 


The  drone  has  one  grandfather  and  one  grandmother;  he  has  one  great- 
grandfather and  two  great-grandmothers;  he  has  two  great-great-grandfathers 
and  three  great-great-grandmothers.  In  general,  it  is  easy  to  see  by  induction 
that  he  has  exactly  Fn+i  greatn-grandpas  and  Tn+2  greatn-grandmas. 

Fibonacci  numbers  are  often  found  in  nature,  perhaps  for  reasons  similar 
to  the  bee -tree  law.  For  example,  a typical  sunflower  has  a large  head  that 
contains  spirals  of  tightly  packed  florets,  usually  with  34  winding  in  one  di- 
rection and  55  in  another.  Smaller  heads  will  have  21  and  34,  or  13  and  21; 
a gigantic  sunflower  with  89  and  144  spirals  was  once  exhibited  in  England. 
Similar  patterns  are  found  in  some  species  of  pine  cones. 

And  here’s  an  example  of  a different  nature  [219]:  Suppose  we  put  two 
panes  of  glass  back-to-back.  How  many  ways  a„  are  there  for  light  rays  to 
pass  through  or  be  reflected  after  changing  direction  n times?  The  first  few 
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cases  are: 


Clo  = 1 ai  = 2 a2  =3  Q3  = 5 


When  Tl  is  even,  we  have  an  even  number  of  bounces  and  the  ray  passes 
through;  when  n is  odd,  the  ray  is  reflected  and  it  re-emerges  on  the  same 
side  it  entered.  The  an's  seem  to  be  Fibonacci  numbers,  and  a little  staring 
at  the  figure  tells  us  why:  For  n ^ 2,  the  n-bounce  rays  either  take  their 
first  bounce  off  the  opposite  surface  and  continue  in  an^i  ways,  or  they  begin 
by  bouncing  off  the  middle  surface  and  then  bouncing  back  again  to  finish 
in  an_2  ways.  Thus  we  have  the  Fibonacci  recurrence  a„  = Qn_i  T an_2. 
The  initial  conditions  are  different,  but  not  very  different,  because  we  have 
do  = 1 = F2  and  d]  = 2 — F3;  therefore  everything  is  simply  shifted  two 
places,  and  a„  = Fn+2. 

Leonardo  Fibonacci  introduced  these  numbers  in  1202,  and  mathemati- 
cians gradually  began  to  discover  more  and  more  interesting  things  about 
them.  Edouard  Lucas,  the  perpetrator  of  the  Tower  of  ffanoi  puzzle  dis- 
cussed in  Chapter  1,  worked  with  them  extensively  in  the  last  half  of  the  nine- 
teenth century  (in  fact  it  was  Lucas  who  popularized  the  name  “Fibonacci 
numbers”).  One  of  his  amazing  results  was  to  use  properties  of  Fibonacci 
numbers  to  prove  that  the  39-digit  Mersenne  number  2177  1 is  prime. 

One  of  the  oldest  theorems  about  Fibonacci  numbers,  due  to  the  French 
astronomer  Jean-Dominique  Cassini  in  1680  [45],  is  the  identity 

Fn+1Fn_!  -F7  = (-1)n,  for  n > 0.  (6.103) 

When  n.  = 6,  for  example,  Cassini’s  identity  correctly  claims  that  1 3-5—8“  = 1 , 

A polynomial  formula  that  involves  Fibonacci  numbers  of  the  form  Fn±i< 
for  small  values  of  k can  be  transformed  into  a formula  that  involves  only  Fn 
andFn+i,  because  we  can  use  the  rule 

Fm  = Fm+2  Fm+1  (6.104) 

to  express  Fm  in  terms  of  higher  Fibonacci  numbers  when  m < n,  and  we  can 
use 


Fm  = Fm_2  + Fm_i  (6.105) 

to  replace  Fm  by  lower  Fibonacci  numbers  when  nr  > n+1  • Thus,  for  example, 
we  can  replace  Fn_i  by  Fn+i  — Fn  in  (6.103)  to  get  Cassini’s  identity  in  the 


“La  suite  de  Fi- 
bonacci possbde 
des  propribtbs 
nombreuses  fort 
intbressantes.” 

-E.  Lucas  [207] 
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form 


F^+1-Fn+1Fn-F;  = (-1  )“.  (6.106) 

Moreover,  Cassini’s  identity  reads 

F n + 2 Fn  - F£+1  = (-l)n+1 

when  n is  replaced  by  rt  + 1;  this  is  the  same  as  (Fn+i  + Fn)Fn  — F£+]  — 

( — 1)n+’,  which  is  the  same  as  (6.106).  Thus  Cassini(n)  is  true  if  and  only  if 
Cassini(n+1)  is  true;  equation  (6.103)  holds  for  all  n by  induction. 

Cassini’s  identity  is  the  basis  of  a geometrical  paradox  that  was  one  of 
Lewis  Carroll’s  favorite  puzzles  [54],  [258],  [298],  The  idea  is  to  take  a chess- 
board and  cut  it  into  four  pieces  as  shown  here,  then  to  reassemble  the  pieces 
into  a rectangle: 


The  paradox  is 
explained  be- 
cause well, 
magic  tricks  aren’t 
supposed  to  be 
explained. 


Presto:  The  original  area  of  8 x 8 = 64  squares  has  been  rearranged  to  yield 
5 x 13  = 65  squares!  A similar  construction  dissects  any  Fn  x Fn  square 
into  four  pieces,  using  Fn+i  , Fn,  Fn  i , and  Fn  i as  dimensions  wherever  the 
illustration  has  13,  8,  5,  and  3 respectively.  The  result  is  an  Fn  i x Fa+i 
rectangle;  by  (6.103),  one  s9uare  has  therefore  been  gained  or  lost,  depending 
on  whether  n is  even  or  odd. 

Strictly  speaking,  we  can’t  apply  the  reduction  (6.105)  unless  m 5>  2, 
because  we  haven’t  defined  Fn  for  negative  n.  A lot  of  maneuvering  becomes 
easier  if  we  eliminate  this  boundary  condition  and  use  (6.104)  and  (6.105)  to 
define  Fibonacci  numbers  with  negative  indices.  For  example,  F ; turns  out 
to  be  F;  Fq  = 1 ; then  F 2 is  Fq  -F  j = -1.  In  this  way  we  deduce  the  values 


n | 0 -1  -2  -3  4 -5  -6  -7  -8  -9  -10  -11 

Fn  0 1 -1  2 -3  5 -8  13  -21  34  -55  89 


and  it  quickly  becomes  clear  (by  induction)  that 

F-n  = (-1  )n~'Fn , integer  n.  (6.107) 

Cassini’s  identity  (6.103)  is  true  for  all  integers  n,  not  just  for  n > 0,  when 
we  extend  the  Fibonacci  sequence  in  this  way. 
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The  process  of  reducing  Fn±k  to  a combination  of  Fn  and  Fn+i  by  using 
(6.105)  and  (6.104)  leads  to  the  sequence  of  formulas 


n+2 

= Fn+i  + 

Fn 

Fn-1  - F 1 — Fn 

n+3 

= 2Fn+i  + 

Fn 

Fn-2  = “Fn+i  +2Fn 

n+4 

= 3Fn+i  + 

2Fn 

Fn-3  - 2Fn+i  — 3Fn 

n+5 

= 5Fn+i  + 

3Fn 

Fn-4  = — 3Fn+i  + 5Fn 

in  which  another  pattern  becomes  obvious: 

^n+k  = FkFn+] + Fk  1 F n . (6.108) 

This  identity,  easily  proved  by  induction,  holds  for  all  integers  k and  n (pos- 
itive, negative,  or  zero). 

If  we  set  k = n in  (6.108),  we  find  that 

F2u  = FnFn+i  + Fn_i  Fn,  (6.109) 

hence  F2n  is  a multiple  of  F,.  Similarly, 

F3n  = F2nFn+l  + F2n-lFn, 

and  we  may  conclude  that  F3n  is  also  a multiple  of  Fn.  By  induction, 

Fkn  is  a multiple  of  Fn  , (6.110) 

for  all  integers  k and  n.  This  explains,  for  example,  why  F15  (which  equals 
610)  is  a multiple  of  both  F3  and  F5  (which  are  equal  to  2 and  5).  Even  more 
is  true,  in  fact;  exercise  27  proves  that 

gcd(Fm,  Fn)  — FgC(j(Tn,n)  ' (6.111) 

For  example,  gcd(Fi2,F]g)  = gcd(144,2584)  = 8 = Fg. 

We  can  now  prove  a converse  of  (6.110):  If  n > 2 and  if  Fm  is  a multiple  of 
Fn,  then  m is  a multiple  of  n.  For  if  Fn\Fm  then  Fn\  gcd(Fm,  F,)  = Fgcd(m,n)  ^ 
F,.  This  IS  possible  only  if  FgCd(m  nj  = Fn;  and  our  assumption  that  n > 2 
makes  it  mandatory  that  gcd(m,  n)  = n.  Hence  n\m. 

An  extension  of  these  divisibility  ideas  was  used  by  Yuri  Matijasevich  in 
his  famous  proof  [213]  that  there  is  no  algorithm  to  decide  if  a given  multivari- 
ate polynomial  equation  with  integer  coefficients  has  a solution  in  integers. 
Matijasevich’ s lemma  states  that,  if  n >2,  the  Fibonacci  number  Fm  is  a 
multiple  of  if  and  only  if  m is  a multiple  of  rtFn. 

Let’s  prove  this  by  looking  at  the  sequence  (Fkn  mod  F([)  for  k = 1,  2, 

3 , , and  seeing  when  Fkn  mod  F^  = 0.  (We  know  that  m must  have  the 
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form  kn  if  Fm  mod  Fu  = 0.)  First  we  have  Fn  mod  F^  = Fn;  that’s  not  zero. 
Next  we  have 

F 2n  — "F  Fn_iFn  = 2FnFn^_i  (mod  Fn) , 

by  (6.108),  since  Fn+i  = Fn_i  (mod  F,).  Similarly 
hn+i  = F^+1  + F‘  = F^+1  (mod  F*) . 

This  congruence  allows  us  to  compute 
F3n  = F2n+lFn-f  F2nFn-1 

= Fn+iFn  + (2FnFn+1)Fn+i  = 3F^+1Fn  (mod  F*)  ; 

F3n+1  = F 2n+ 1 Fn+1  + l"2nFn 

= F^+1  + (2FnFn+i  )Fn  = F^+1  (mod  F^)  . 

In  general,  we  find  by  induction  on  k that 

Flcn  - kFnF*+]  and  Fkn+1  = F*+1  (mod  F^) . 

Now  Fn+i  is  relatively  prime  to  Fn,  so 

Fkn  - 0 (mod  F^)  4=F  kFn  = 0 (mod  F^) 

4=F  k = 0 (mod  F,). 


We  have  proved  Matijasevich’s  lemma. 

One  of  the  most  important  properties  of  the  Fibonacci  numbers  is  the 
special  way  in  which  they  can  be  used  to  represent  integers.  Let’s  write 

j » k <=>  j £ k + 2.  (6.112) 

Then  every  positive  integer  has  a unique  representation  0 f the  form 

n = Fkl  + Fk2  + . . . + Fkr , k,  » k2  » . . . » kr  » 0.  (6.113) 

(This  is  “Zeckendorf’ s theorem”  [201],  [312].)  For  example,  the  representation 
of  one  million  turns  out  to  be 

1000000  = 832040  + 121393  + 46368  + 144  + 55 
= F30  + F26  + F24  + F12  ~ I-  F"  1 0 ' 

We  can  always  find  such  a representation  by  using  a “greedy”  approach, 
choosing  Fk]  to  be  the  largest  Fibonacci  number  <1  n,  then  choosing  Fk2 

to  be  the  largest  that  is  ^ n — Fk| , and  so  on.  (More  precisely,  suppose  that 
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Fk  <5  n < Fk+i;  then  we  have  0 n — Fk  < Fk+i  ■ ■ Fk  = Fk_  i • If  n is  a 
Fibonacci  number,  (6.113)  holds  with  r = 1 and  k;  = k.  Otherwise  n — Fk 
has  a Fibonacci  representation  Fk,  +•  + Fk|. , by  induction  on  n;  and  (6.113) 
holds  if  we  set  k;  = k,  because  the  inequalities  Fk,  <:  n — Fk  < Fk  i imply 
that  k k2.)  Conversely,  any  representation  of  the  form  (6.113)  implies  that 

Fk,  ^ n < Fk, +i  , 

because  the  largest  possible  value  of  Fk , + • • ■ + Fkr  when  k k? 
kr  > 0 is 


Fk-2  + Fk-4  H F Fk  mod  2+2  = Fk._,  -1,  if  k ^ 2.  (6.114) 

(This  formula  is  easy  to  prove  by  induction  on  k;  the  left-hand  side  is  zero 
when  k is  2 or  3.)  Therefore  k;  is  the  greedily  chosen  value  described  earlier, 
and  the  representation  must,  be  unique. 

Any  unique  system  of  representation  is  a number  system;  therefore  Zeck- 
endorf’ s theorem  leads  to  the  Fibonacci  number  system.  We  can  represent 
any  nonnegative  integer  n as  a sequence  of  O’s  and  1 ‘s,  writing 

m 

n = (bmbm-i  • - • t>2)F  «=4>  n = ^bkFk.  (6.115) 

k=2 


This  number  system  is  something  like  binary  (radix  2)  notation,  except  that 
there  never  are  two  adjacent  1 ’s.  For  example,  here  are  the  numbers  from  1 
to  20,  expressed  Fibonacci-wise: 


1 = ( 0 0 0 0 0 1 )- 
2 = ( 000010) ~ 

3 = ( 000100) ~ 

4 = ( 000101) ~ 

5 = ( 001000) ~ 


6 = (001001  )F 

7 = ( 001010) > 

8 = (010000)F 

9 = ( 0 1 0 0 0 1 ) > 

10  = ( 010010) > 


11  = ( 010100)  - 
12  = (010101)- 
1 3 = ( 10000  0)  - 
14  = ( 10000  1)  - 
1 5 = ( 1 0 0 0 1 0 ) - 


16  =(100100)F 

17=  ( 100101)  - 

18  = (IOIOOO)f 

19  = (101001)- 

20  = ( 101010)  - 


The  Fibonacci  representation  of  a million,  shown  a minute  ago,  can  be  con- 
trasted with  its  binary  representation  219+  2 18  + 217  + 216  + 2,4  + 29  + 26: 

(1000000)  10  = ( 1 000  1 0 100000000000  10  1 00000000  )- 

= (11110100001001000000)-, 

The  Fibonacci  representation  needs  a few  more  bits  because  adjacent  I1  s are 
not  permitted;  but  the  two  representations  are  analogous. 

To  add  1 in  the  Fibonacci  number  system,  there  are  two  cases:  If  the 
“units  digit”  is  0,  we  change  it  to  1 ; that  adds  Fi  = 1 1 since  the  units  digit 
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“Sit  1 + x + 2 x x + 
3x3  + 5x4  + 8x5  + 
13x6  + 21 x7  + 
34x8&cSen  es  nata 
ex  divisione  Unitatis 
per  Trinomium 
1 -x-xx.” 

-A.  de  Moivre 

‘'The  quantities 
r,  s,  t,  which 
show  the  relation 
of  the  terms,  are 
the  same  as  those  in 
the  denominator  of 
the  fraction.  This 
property,  howsoever 
obvious  it  may 
be,  M.  DeMoivre 
was  the  first  that 
applied  it  to  use, 
in  the  solution  of 
problems  about 
infinite  series,  which 
otherwise  would 
have  been  very 
intricate.” 

—J.  Stirring  [281] 


refers  to  F2.  Otherwise  the  two  least  significant  digits  will  be  01,  and  we 
change  them  to  10  (thereby  adding  F3  — F2  = 1).  Finally,  we  must  “carry” 
as  much  as  necessary  by  changing  the  digit  pattern  ‘Oil1  to  ‘ 1 0 0 1 until  there 
are  no  two  I1  s in  a row.  (This  carry  rule  is  equivalent  to  replacing  Fm+>  + Fm 
by  Fm+2.)  For  example,  to  go  from  5 = ( 1 000) p to  6 = (1001  )f  or  from 
6 = ( 1001  )f  to  7 = (1010)f  requires  no  carrying;  but  to  go  from  7 = (1010)f 
to  8 ^ ( 1 0000)  f we  must  carry  twice. 

So  far  we’ve  been  discussing  lots  of  properties  of  the  Fibonacci  numbers, 
but  we  haven’t  come  up  with  a closed  formula  for  them.  We  haven’t  found 
closed  forms  for  Stirling  numbers,  Eulerian  numbers,  or  Bernoulli  numbers 
either;  but  we  were  able  to  discover  the  closed  form  Hn  = [Tl21]/n-  f°r  har- 
monic numbers.  Is  there  a relation  between  Fn  and  other  quantities  we  know? 
Can  we  “solve”  the  recurrence  that  defines  F,  ? 

The  answer  is  yes.  In  fact,  there’s  a simple  way  to  solve  the  recurrence  by 
using  the  idea  of  generating  function  that  we  looked  at  briefly  in  Chapter  5. 
Let’s  consider  the  infinite  series 

F(z)  = F0  + F,z  + F2z2  +■••  = ^Fnzn.  (6.1  f 6) 

n^O 

If  we  can  find  a simple  formula  for  F ( z ) , chances  are  reasonably  good  that  we 
can  find  a simple  formula  for  its  coefficients  Fn. 

In  Chapter  7 we  will  focus  on  generating  functions  in  detail,  but  it  will 
be  helpful  to  have  this  example  under  our  belts  by  the  time  we  get  there. 
The  power  series  F(z)  has  a nice  property  if  we  look  at  what  happens  when 
we  multiply  it  by  z and  by  z2: 

F ( z ) = F0  + Ft 2 + F2z2  + F3z3  + f4z4  + F5z5  + ...  , 
zF(z)  = F0z  + Fiz2  + F2z3  -F  F3z4  + F4z5  + - - • , 

z2F(z)  = F0z2  + Fiz3  -F  F2z4  + F3Z5  -| . 

If  we  now  subtract  the  last  two  equations  from  the  first,  the  terms  that  involve 
z2,  z3,  and  higher  powers  of  z will  all  disappear,  because  of  the  Fibonacci 
recurrence.  Furthermore  the  constant  term  Fo  never  actually  appeared  in  the 
first  place,  because  Fo  = 0.  Therefore  all  that’s  left  after  the  subtraction  is 
(F;  — Fo)z,  which  is  just  z.  In  other  words, 

F(z)  - zF(z)  - z2F(z)  = z , 
and  solving  for  F(z)  gives  us  the  compact  formula 


1 — z — z2 


F(z) 


(6.117) 


284  SPECIAL  NUMBERS 


We  have  now  boiled  down  all  the  information  in  the  Fibonacci  sequence 
to  a simple  (although  unrecognizable)  expression  z/(  1 — z — z2).  This,  believe 
it  or  not,  is  progress,  because  we  can  factor  the  denominator  and  then  use 
partial  fractions  to  achieve  a formula  that  we  can  easily  expand  in  power  series. 
The  coefficients  in  this  power  series  will  be  a closed  form  for  the  Fibonacci 
numbers. 

The  plan  of  attack  just  sketched  can  perhaps  be  understood  better  if 
we  approach  it  backwards.  If  we  have  a simpler  generating  function,  say 
1 / ( 1 ctz)  where  a is  a constant,  we  know  the  coefficients  of  all  powers  of  z, 
because 

— - — = 1 + az  + a2z2  + a3z3  H . 

1—  ctz 

Similarly,  if  we  have  a generating  function  of  the  form  A/(  1 — az)  + B/(  1 — |3z), 
the  coefficients  are  easily  determined,  because 

— = A y (az)a  + B y (Pz)n 
l-az  1 - Pz  A-n  4_u 

n^O  n^O 

= JjAan  + Bpn)zn.  (6.118) 

n^O 

Therefore  all  we  have  to  do  is  find  constants  A,  B,  a,  and  (3  such  that 

A , B z 

1 ™ az  + 1 - Pz  1 — z - z2  ’ 

and  we  will  have  found  a closed  form  Aan  + Bpn  for  the  coefficient  Fn  of  zn 
in  F(z).  The  left-hand  side  can  be  rewritten 

A B A — Apz  + B — Baz 

l-az  1 - pz  = (l-az)(l-pz)  1 

so  the  four  constants  we  seek  are  the  solutions  to  two  polynomial  equations: 

(1  -az)(1-  pz)  = 1 -z-z2;  (6.119) 

(A  + B)-(Ap  + Ba)z  = z,  (6.120) 

We  want  to  factor  the  denominator  of  F(z)  into  the  form  (1  — az)(l  — pz); 
then  we  will  be  able  to  express  F(z)  as  the  sum  of  two  fractions  in  which  the 
factors  (1  — az)  and  (1  — Pz)  are  conveniently  separated  from  each  other. 

Notice  that  the  denominator  factors  in  (6.119)  have  been  written  in  the 
form  (1  ~ az)  (1  — Pz),  instead  of  the  more  usual  form  c(z  — pi)  (z  — P2 ) where 
pi  and  p2  are  the  roots.  The  reason  is  that  (1  — az)(  1 — Pz)  leads  to  nicer 
expansions  in  power  series. 
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As  usual,  the  au- 
thors can't  resist 

a trill. 


We  can  find  cx  and  3 in  several  ways,  one  of  which  uses  a slick  trick:  Let 
us  introduce  a new  variable  w and  try  to  find  the  factorization 

w2  - wz  - z2  = (w  — <xz)  (w  |3z)  . 


Then  we  can  simply  set  w = 1 and  we’ll  have  the  factors  of  1 z — z2.  The 
roots  of  w 1 WZ  z}  = 0 can  be  found  by  the  quadratic  formula;  they  are 

z ± \/z2  + 4z2  1 ± \/5 

2 = 2 


Therefore 

w2  — wz  — z2 


1 - y/5 


The  ratio  of  one’s 
height  to  the  height 
of  one’s  navd  is 
approxiraate/y 
1.  618,  accord- 
ing  to  extensive 
empirical  observa- 
tions by  European 
scholars  [11  OJ. 


and  we  have  the  constants  a and  3 we  were  looking  for. 

The  number  ( 1 -j-  \/5)/2  ~ 1.61803  is  important  in  many  parts  of  mathe- 
matics as  well  as  in  the  art  world,  where  it  has  been  considered  since  ancient 
times  to  be  the  most  pleasing  ratio  for  many  kinds  of  design.  Therefore  it 
has  a special  name,  the  golden  ratio.  We  denote  it  by  the  Greek  letter  (j)i  in 
honor  of  Phidias  who  is  said  to  have  used  it  consciously  in  his  sculpture.  The 
other  root  (1  — \/5 ) /2  = -l/tf)  ss  -.61803  shares  many  properties  of  c|3,  so  it 
has  the  special  name  (jj  “phi  hat!’  These  numbers  are  roots  of  the  equation 
w2  — w — 1 =Q,sowehave 


4>2  = 4>  + 1 ; $2  - $ + 1 


(6.121) 


(More  about  (j)  and  $ later.) 

We  have  found  the  constants  cx  = and  3 = $ needed  in  (6.119);  now 
we  merely  need  to  find  A and  B in  (6.120).  Setting  z = 0 in  that  equation 
tells  us  that  B = -A,  so  (6.120)  boils  down  to 


-$A  + 4)A  = 1 . 

The  solution  is  A = 1 /(cj)  — $)  = 1 /\/5;  the  partial  fraction  expansion  of 
(6.117)  is  therefore 

rw  = 7i(i^-T^)  <6l22) 

Good,  we’ve  got  F ( z ) right  where  we  want  it.  Expanding  the  fractions  into 
power  series  as  in  (6.118)  gives  a closed  form  for  the  coefficient  of  zn: 

Fn  = ^=($n  ~$n)  • (6-123) 

(This  formula  was  first  published  by  Leonhard  Euler  [91]  in  1765,  but  people 
forgot  about  it  until  it  was  rediscovered  by  Jacques  Binet  [25]  in  1843.) 
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Before  we  stop  to  marvel  at  our  derivation,  we  should  check  its  accuracy. 
For  ri  = 0 the  formula  correctly  gives  F0  = 0;  for  n = 1,  it  gives  Fi  = 
(cf)  — $)/\/5,  which  is  indeed  1.  For  higher  powers,  equations  (6.121)  show 
that  the  numbers  defined  by  (6.123)  satisfy  the  Fibonacci  recurrence,  so  they 
must  be  the  Fibonacci  numbers  by  induction.  (We  could  also  expand  cj)n 
and  <J)n  by  the  binomial  theorem  and  chase  down  the  various  powers  of  \/5; 
but  that  gets  pretty  messy.  The  point  of  a closed  form  is  not  necessarily  to 
provide  us  with  a fast  method  of  calculation,  but  rather  to  tell  us  how  Fn 
relates  to  other  quantities  in  mathematics.) 

With  a little  clairvoyance  we  could  simply  have  guessed  formula  (6.123) 
and  proved  it  by  induction.  But  the  method  of  generating  functions  is  a pow- 
erful way  to  discover  it;  in  Chapter  7 we’ll  see  that  the  same  method  leads  us 
to  the  solution  of  recurrences  that  are  considerably  more  difficult.  Inciden- 
tally, we  never  worried  about  whether  the  infinite  sums  in  our  derivation  of 
(6.123)  were  convergent;  it  turns  out  that  most  operations  on  the  coefficients 
of  power  series  can  be  justified  rigorously  whether  or  not  the  sums  actually 
converge  [151],  Still,  skeptical  readers  who  suspect  fallacious  reasoning  with 
infinite  sums  can  take  comfort  in  the  fact  that  equation  (6.123),  once  found 
by  using  infinite  series,  can  be  verified  by  a solid  induction  proof. 

One  of  the  interesting  consequences  of  (6.123)  is  that  the  integer  Fn  is 
extremely  close  to  the  irrational  number  when  n is  large.  (Since  <Jj  is 

less  than  1 in  absolute  value,  $n  becomes  exponentially  small  and  its  effect 
is  almost  negligible.)  For  example,  Fio  = 55  and  Fn  = 89  are  very  near 

4>io  *11 

— w 55.00364  and  ^ « 88.99775. 

V5  V5 

We  can  use  this  observation  to  derive  another  closed  form, 


Fn  = 


<\>n  1 

V5  + 2 


4>n 

\75 


rounded  to  the  nearest  integer, 


(6.124) 


because  <j)n/\/5  < \ for  all.  n ^ 0.  When  n is  even,  Fa  is  a little  bit  less 
than  4)n/v/5;  otherwise  it  is  a little  greater. 

Cassini’s  identity  (6.103)  can  be  rewritten 


Fn+l  _ _[n_  _ (-1  )U 

Fn  Fn-1  Fn-1  Fn 

When  ri  is  large,  1 /Fn-i  Fn  is  very  small,  so  Fn+i  /Fn  must  be  very  nearly  the 
same  as  Fn/Fn_i;  and  (6.124)  tells  us  that  this  ratio  approaches  4)-  In  fact, 
we  have 


F n+i  = 4>Fn  + $n  . 


(6.125) 
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If  the  USA  ever 
goes  metric,  our 
speed  limit  signs 
will  go  from  55 
mi/hr  to  89  km/hr. 
Or  maybe  the  high, 
way  people  will  be 
generous  and  let  us 
go  90. 


The  “shift  down” 
rule  changes  n 
to  f(n/<J>)and 
the  “shift  up” 
rule  changes  n 
to  f (n4>) , where 
f(x)  = |_x  + 4>~'J 


(This  identity  is  true  by  inspection  when  n = 0 or  n = 1 , and  by  induction 
when  n > 1;  we  can  also  prove  it  directly  by  plugging  in  (6.123).)  The  ratio 
Fn+i/Fn  is  very  close  to  tj),  which  it  alternately  overshoots  and  undershoots. 

By  coincidence,  (j)  is  also  very  nearly  the  number  of  kilometers  in  a mile. 
(The  exact  number  is  1.  609344,  since  1 inch  is  exactly  2.54  centimeters.) 
This  gives  us  a handy  way  to  convert  mentally  between  kilometers  and  miles, 
because  a distance  of  Fn+i  kilometers  is  (very  nearly)  a distance  of  Fu  miles. 

Suppose  we  want  to  convert  a non-Fibonacci  number  from  kilometers 
to  miles;  what  is  30  km,  American  style?  Easy:  We  just  use  the  Fibonacci 
number  system  and  mentally  convert  30  to  its  Fibonacci  representation  21  + 
8 + 1 by  the  greedy  approach  explained  earlier.  Now  we  can  shift  each  number 
down  one  notch,  getting  13  + 5 + 1 . (The  former  ' 1 ' was  F2,  since  kr  )$>  0 in 
(6.113);  the  new  ‘1’  is  F;.)  Shifting  down  divides  by  cj),  more  or  less.  Hence 
19  miles  is  our  estimate.  (That’s  pretty  close;  the  correct  answer  is  about 
18.64  miles.)  Similarly,  to  go  from  miles  to  kilometers  we  can  shift  up  a 
notch;  30  miles  is  approximately  34  + 13  + 2 = 49  kilometers.  (That’s  not 
quite  as  close;  the  correct  number  is  about  48.28.) 

It  turns  out  that  this  “shift  down”  rule  gives  the  correctly  rounded  num- 
ber of  miles  per  n.  kilometers  for  all  n + 100,  except  in  the  cases  n = 4,  12, 
62,  75,  91,  and  96,  when  it  is  off  by  less  than  2/3  mile.  And  the  “shift  up” 
rule  gives  either  the  correctly  rounded  number  of  kilometers  for  n miles,  or 
1 km  too  many,  for  all  n +126.  (The  only  really  embarrassing  case  is  n = 4, 
where  the  individual  rounding  errors  for  n = 3 + 1 both  go  the  same  direction 
instead  of  cancelling  each  other  out.) 


6.7  CONTINUANTS 

Fibonacci  numbers  have  important  connections  to  the  Stern-Brocot 
tree  that  we  studied  in  Chapter  4,  and  they  have  important  generalizations  to 
a sequence  of  polynomials  that  Euler  studied  extensively.  These  polynomials 
are  called  continuants,  because  they  are  the  key  to  the  study  of  continued 
fractions  like 


do  + 


Gl  + 


Q2  + 


d3  + 


a4 


a5  + 


a6 


Q7 


(6.126) 
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The  continuant  polynomial  Kn(xi  ,X2,.  • • > x,)  has  n parameters,  and  it 
is  defined  by  the  following  recurrence: 

Ko()  = 1 i 

K]  (x,)  = xi  ; 

Kn  (Xl  , . i , , Xn)  = Kn_]  (xi  , . . . , Xn_]  )xn  + Kn-2  (Xl  , ■ . . , Xn— 2)  • (6.127) 

For  example,  the  next  three  cases  after  K]  (xi)  are 

K2(Xl  ,x2)  = X1X2  + 1 | 

K3(Xt  ,X2,X3)  = X1X2X3  +X,  +x  3 ; 

K4(Xi  ,X2,X3,X4)  - X1X2X3X4  + X1X2  + X1X4  + X3X4  + 1 

It’s  easy  to  see,  inductively,  that  the  number  of  terms  is  a Fibonacci  number: 

Kn(l,l,...,l)  = Fn+1  . (6.128) 

When  the  number  of  parameters  is  implied  by  the  context,  we  can  write 
simply  ‘K’  instead  of  ‘K,‘,  just  as  we  can  omit  the  number  of  parameters 
when  we  use  the  hypergeometric  functions  F of  Chapter  5.  For  example, 
K(xi , X2)  = K2 (xi , X2)  = X]  X2  + 1.  The  subscript  n is  of  course  necessary  in 
formulas  like  (6. 128) . 

Euler  observed  that  K(xi , X2,  • • • , xn)  can  be  obtained  by  starting  with 
the  product  Xi  X2  • • • Xn  and  then  striking  out  adjacent  pairs  x^x^+i  in  all 
possible  ways.  We  can  represent  Euler’s  rule  graphically  by  constructing  all 
“Morse  code”  sequences  of  dots  and  dashes  having  length  n,  where  each  dot 
contributes  1 to  the  length  and  each  dash  contributes  2;  here  are  the  Morse 
code  sequences  of  length  4: 


* • I — — 


These  dot-dash  patterns  correspond  to  the  terms  of  K(xi  ,X2,X3,x4);  a dot 
signifies  a variable  that’s  included  and  a dash  signifies  a pair  of  variables 
that’s  excluded.  For  example,  . - . corresponds  to  xix4. 

A Morse  code  sequence  of  length  n that  has  k dashes  has  n-2k  dots  and 
n k symbols  altogether.  These  dots  and  dashes  can  be  arranged  in  (nkk) 
ways;  therefore  if  we  replace  each  dot  by  z and  each  dash  by  1 we  get 


Kn(z,  Z,  • 


zn~2k 


(6.129) 
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We  also  know  that  the  total  number  of  terms  in  a continuant  is  a Fibonacci 
number;  hence  we  have  the  identity 


F 


n + l 


(6.130) 


(A  closed  form  for  (6.129),  generalizing  the  Euler-Binet  formula  (6.123)  for 
Fibonacci  numbers,  appears  in  (5.74).) 

The  relation  between  continuant  polynomials  and  Morse  code  sequences 
shows  that  continuants  have  a mirror  symmetry: 


K(xn,...,X2,Xl)  = K(xi,x2 xn).  (6.131) 

Therefore  they  obey  a recurrence  that  adjusts  parameters  at  the  left,  in  ad- 
dition to  the  right-adjusting  recurrence  in  definition  (6.127): 

Kn(xi,...,Xft)  = X]  Kn  i(x2,...,xn]  TKfj  2 ( X3 , . . , , Xn ) . (6.132) 

Both  of  these  recurrences  are  special  cases  of  a more  general  law: 


F m+n  (*1  > * • • > Xm,Xm-fi,  . . . , Xm-j_n) 

= (xi  , . . . , Xm  ) Kn  (xm+ 1 , • • • , Xm-)_n  ) 

+ Km  1 ( x 1 , . . . , xm  1 ) Kn  1 (xm+2, . . . , xm^n) . (6.133) 

This  law  is  easily  understood  from  the  Morse  code  analogy:  The  first  product 
KmKn  yields  the  terms  of  Km+n  in  which  there  is  no  dash  in  the  [m,  m + 1] 
position,  while  the  second  product  yields  the  terms  in  which  there  is  a dash 
there.  If  we  set  all  the  x’s  equal  to  1,  this  identity  tells  us  that  Fm+n+i  = 
Fm+i Fri+i  + FmFn;  thus,  (6.108)  is  a special  case  of  (6.133). 

Euler  [90]  discovered  that  continuants  obey  an  even  more  remarkable  law, 
which  generalizes  Cassini’s  identity: 


F m + n (Xl  , . . . , Xm+n)  Kk(xm+i  , . Xm+k) 

= Km+k  (xi, . . . , *m+k ) (xm4-i , . . . , Xnx+n. ) 

♦ H)kKm  i(xi,...,xm  i)Kn  k i(xm+k+2,...,xm+n).  (6.134) 

This  law  (proved  in  exercise  29)  holds  whenever  the  subscripts  on  the  K’s  are 
all  nonnegative.  For  example,  when  k = 2,  m = 1 , and  n = 3,  we  have 

K(xi,x2,X3,x4)  K(x2,X3)  = K(x, , x2 , x3 ) K(x2 , X3  , x4 ) T 1 


Continuant  polynomials  are  intimately  connected  with  Euclid’s  algo- 
rithm. Suppose,  for  example,  that  the  computation  of  gcd(m,  n)  finishes 
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in  four  steps: 

gcd(m,  nj  = gcd(n0,  ni 
= gcd(ni , n2 
= gcd(n2,u3l 
= gcd(n3,  n4'i 
= gcd(ri4,0)  = rt4 

Then  we  have 

n4  = n4  = K ()n4  ; 

TI3  = q4n4  = K(q4)n4 ; 

n2  q3n3  + n4  = K(q3,q4)n4; 
m = q2n2  + n3  = 
no  = q i ni  + n2  = 


no  = nr  ri]  = n; 
n2  = nomodni=  rio-qini; 
rt3  = ni  mod  n2  = ni  - q2n2  ; 
n4  = n2  mod  n3  = n2  — q3n3 ; 
0 = n3  mod  n4  = n3  - q4n4 . 


K(q2,q3,q4)n4; 

K(qi,q2,q3,q4)n4 


In  general,  if  Euclid’s  algorithm  finds  the  greatest  common  divisor  d in  k steps, 
after  computing  the  sequence  of  quotients  q 1 , . . . , qr,  then  the  starting  num- 
bers were  K(qi , q2, . . . , qk)d  and  K(q2, . . . , qk)d.  (This  fact  was  noticed  early 
in  the  eighteenth  century  by  Thomas  Fantet  de  Lagny  [190],  who  seems  to 
have  been  the  first  person  to  consider  continuants  explicitly.  Lagny  pointed 
out  that  consecutive  Fibonacci  numbers,  which  occur  as  continuants  when  the 
q’s  take  their  minimum  values,  are  therefore  the  smallest  inputs  that  cause 
Euclid’s  algorithm  to  take  a given  number  of  steps.) 

Continuants  are  also  intimately  connected  with  continued  fractions,  from 
which  they  get  their  name.  We  have,  for  example, 

1 _ K(q0,  cm  , a2,  a3) 

U°  + I ~ K(ai,a2,a3) 

ai+  — 

ci2  H — — 

a3 

The  same  pattern  holds  for  continued  fractions  of  any  depth.  It  is  easily 
proved  by  induction;  we  have,  for  example, 

K(a0, qi , a 2, a3  + 1/a4)  _ K(q0, qi , a2, q3, a4] 

K(qi,  q2,  q3  + 1/04)  “ K(ai , a2,  q3,  q4] 

because  of  the  identity 

Kn  (*i , . . . , Xn_i  , xn  -p  y ) 

= Kn  (xi , . . . , Xn— 1 , Xn)  T Kn_i  (X] , . . . , Xn- 1 ) (6.136) 


(6.135) 


(This  identity  is  proved  and  generalized  in  exercise  30.) 
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Moreover,  continuants  are  closely  connected  with  the  Stern-Brocot  tree 
discussed  in  Chapter  4.  Each  node  in  that  tree  can  be  represented  as  a 
sequence  of  L’s  and  R’s,  say 


pQO  ^Q|  J^Q3  J^Qn-2|_an-l 


(6.137) 


where  Qo  3 0,  Qi  3 1 , cl2  3 1,  a3  ^ 1 , an-2  ^ 1,  an  , ^ 0,  and  n is 
even.  Using  the  2 x 2 matrices  L and  R of  (4.33),  it  is  not  hard  to  prove  by 
induction  that  the  matrix  equivalent  of  (6.137)  is 


/ Kn-2(<ll  > • 1 1 \ Qn_2)  Kn_]  ( Q)  , . . . , a.n_2 , Qft  l) 
\Kn_i  (qq,  Qi , . . , , Qn-2)  Kn(ao,  ai , . . . , an-  2.  Qn-i ) 

(The  proof  is  part  of  exercise  80.)  For  example. 


(6.138) 


RaLbRcLd  ( bc+1  bcd  + b + d \ 

v abc  + a + c abed  + ab  + ad  + cd  + 1 ) 

Finally,  therefore,  we  can  use  (4.34)  to  write  a closed  form  for  the  fraction  in 
the  Stern-Brocot  tree  whose  L-and-R  representation  is  (6.137): 

f(RQc  ,,.LQ"-')  = Kn+l(ao,Qi,.-.,Qn- 1,1)  (6.139) 

(This  is  “Halphen’s  theorem”  [143].)  For  example,  to  find  the  fraction  for 
LRRL  we  have  do  = 0,  ai  = 1,  Q2  = 2,  CI3  = 1,  and  n = 4;  equation  (6.139) 
gives 


K(0,  1,2, 1, 1)  _ K(2, 1,1)  K(2,2)  5 

K(1 ,2, 1 , 1 ) K(1 ,2, 1 , 1 ) K (3 , 2)  7 ' 

(We  have  used  the  rule  Kn(x! , . . . , xn_i , Xn  + 1 ) = Kn+i  (*i » ■ . • , Xn_i , Xn,  1 ) to 
absorb  leading  and  trailing  Ts  in  the  parameter  lists;  this  rule  is  obtained  by 
setting  y = 1 in  (6.136).) 

A comparison  of  (6.135)  and  (6.139)  shows  that  the  fraction  correspond- 
ing to  a general  node  (6.137)  in  the  Stern-Brocot  tree  has  the  continued 
fraction  representation 

f(Rao...  LQ-')  =Qo  + (6.140) 


...  + 


an  1 + J 
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Thus  we  can  convert  at  sight  between  continued  fractions  and  the  correspond- 
ing nodes  in  the  Stern-Brocot  tree.  For  example. 


f(LRRL)  = 0+  

1 + 


2 + 


We  observed  in  Chapter  4 that  irrational  numbers  define  infinite  paths 
in  the  Stern-Brocot  tree,  and  that  they  can  be  represented  as  an  infinite 
string  of  L’s  and  R’s.  If  the  infinite  string  for  a is  Ra°Lai  Ra2^a3 . . . , there  is 
a corresponding  infinite  continued  fraction 


a0  + 


Ql  + 


a2 


a3 


+ ■ 


0-4  + 


a5  4 • 


(6.141) 


This  infinite  continued  fraction  can  also  be  obtained  directly:  Let  £Xp  — a and 
for  k ^ 0 let 

ak  = [aicj  ; ak  - ak  + — ' — . (6.142) 

W-k+1 

The  a’s  are  called  the  “partial  quotients”  of  a.  If  a is  rational,  say  m/n, 
this  process  runs  through  the  quotients  found  by  Euclid’s  algorithm  and  then 
stops  (with  ctk+i  =00), 

Is  Euler’s  constant  y rational  or  irrational?  Nobody  knows.  We  can  get 
partial  information  about  this  famous  unsolved  problem  by  looking  for  y in 
the  Stern-Brocot  tree;  if  it’s  rational  we  will  find  it,  and  if  it’s  irrational  we 
will  find  all  the  closest  rational  approximations  to  it.  The  continued  fraction 
for  y begins  with  the  following  partial  quotients: 


k 

0 

1 

2 3 4 5 

6 7 

8 

Clk 

0 

1 

12  12 

1 4 

3 

Therefore  its  Stern-Brocot  representation  begins  LRLLRLLRLLLLRRRL  . . ; no 
pattern  is  evident.  Calculations  by  Richard  Brent  [33]  have  shown  that,  if  y 
is  rational,  its  denominator  must  be  more  than  10,000  decimal  digits  long. 


Or  if  they  do, 
they're  not  talking. 
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Well,  y must  be 

irrational,  because 

of  a little-known 
Einsteinian  asser- 
tion: "God  does 
not  throw  huge 

denominators  at 
the  universe," 


Therefore  nobody  believes  that  y is  rational;  but  nobody  so  far  has  been  able 
to  prove  that  it  isn’t. 

Let’s  conclude  this  chapter  by  proving  a remarkable  identity  that  ties  a lot 
of  these  ideas  together.  We  introduced  the  notion  of  spectrum  in  Chapter  3; 
the  spectrum  of  (X  is  the  multiset  of  numbers  Lnokl , where  a is  a given  constant. 
The  infinite  series 


Y_  zLn<t>J  = Z + z3  + z4  + z6  + z8  + z9  + • • • 

njl 


can  therefore  be  said  to  be  the  generating  function  for  the  spectrum  of 
where  (f)  = (1  -f-  y/5)/2  is  the  golden  ratio.  The  identity  we  will  prove,  dis- 
covered in  1976  by  J.L.  Davison  [61],  is  an  infinite  continued  fraction  that 
relates  this  generating  function  to  the  Fibonacci  sequence: 


(1  -zl^zL'^J  . 


(6.143) 


1 + 


zh* 


Both  sides  of  (6.143)  are  interesting;  let’s  look  first  at  the  numbers  Ln4>J. 
If  the  Fibonacci  representation  (6.113)  of  n is  Fk]  + • • • + Fkr,  we  expect  ncj) 
to  be  approximately  Fk|  yi  + • . . + Fk,  + 1 , the  number  we  get  from  shifting  the 
Fibonacci  representation  left  (as  when  converting  from  miles  to  kilometers). 
In  fact,  we  know  from  (6.125)  that 


n<t>  = Fk,+i  + " . + Fkr+1  - ($k>  + , + ) ( 

Now  $ = — 1 /4>  and  k;  » • • • » kr  > 0,  so  we  have 
|$k|  + • • • + $kr I < 4wk"  + <t>  k' -2  + + • • • 


and  $k|  + -"  + $k'  has  the  same  sign  as  (-l)kr,  by  a similar  argument.  Hence 


|ruFJ  = Fk]+i  H h Fkr  + i - [ k , ( n ) is  even]  . (6.144) 

Let  us  say  that  a number  n is  Fibonacci  odd  (or  F-odd  for  short)  if  its  least 
significant  Fibonacci  bit  is  1;  this  is  the  same  as  saying  that  k,(n)  = 2. 
Otherwise  n is  Fibonacci  even  (F-even).  For  example,  the  smallest  F-odd 
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numbers  are  1,  4,  6,  9,  12,  14,  17,  and  19.  If  k,(n)  is  even,  then  n — 1 is 
F-even,  by  (6.114);  similarly,  if  k,(n)  is  odd,  then  n — 1 is  F-odd.  Therefore 

k,(n)  is  even  4=^  n 1 is  F-even. 

Furthermore,  if  k,(n)  is  even,  (6.144)  implies  that  kr(  In^J)  = 2;  if  k,(n)  is 
odd,  (6.144)  says  that  kr(  [_ri4>J)  = k,(n)  + 1.  Therefore  kr(  |_tlcJ3J  ) is  always 
even,  and  we  have  proved  that 

[n(J>J  — 1 is  always  F-even. 

Conversely,  if  m is  any  F-even  number,  we  can  reverse  this  computation  and 
find  an  n such  that  m + 1 (First  add  1 in  F-notation  as  explained 

earlier.  If  no  carries  occur,  n is  (m  + 2)  shifted  right;  otherwise  n is  (m  + 1) 
shifted  right.)  The  right-hand  sum  of  (6.143)  can  therefore  be  written 

zLn<FJ  — z zm  [m  is  F-even]  , 

Tl^l  m^O 

How  about  the  fraction  on  the  left?  Let’s  rewrite  (6.143)  so  that  the 
continued  fraction  looks  like  (6.141),  with  all  numerators  1: 

^X>Ln*J'  (6.146) 

nj:l 


(6.145) 


(This  transformation  is  a bit  tricky!  The  numerator  and  denominator  of  the 
original  fraction  having  zFn  as  numerator  should  be  divided  by  zFn-'  ,)  If 
we  stop  this  new  continued  fraction  at  1/z~Fn,  its  value  will  be  a ratio  of 
continuants, 

Kn+2(0,z-F°,z-F',,..,z-F")  Kn  (z~F| 2TFn) 

Ka+i  (z_F° , z-F' , . . . , z_F* ) Kn+i  (z_F» , z_Fl , . . , z“F" ) ’ 

as  in  (6.135).  Let’s  look  at  the  denominator  first,  in  hopes  that  it  will  be 
tractable.  Setting  Qn  = Kn+iz(z~,. . , z~Fn),  we  find  Q0  = 1,  Q]  = 1 + z~\ 

Q 2 = 1 +Z"1  t ~ 2 p = - 1 z_1  + Z-2  + z~3  T z“4,  and  in  general  everything 
fits  beautifully  and  gives  a geometric  series 


Qn  =1+  z 1 + z 2 + ...  + z-(F«+2-i)  . 
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The  corresponding  numerator  is  Pn  = Kn(z  F' , . . . , z F" );  this  turns  out  to 
be  like  Qn  but  with  fewer  terms.  For  example,  we  have 

P5  = z~'  + z “ + z 4 + z~5  + z-1  + z~9  + z-10  + z~12 , 

compared  with  Q5  = 1 + z " 1 + • • + z" 1 2 . A closer  look  reveals  the  pattern 
governing  which  terms  are  present:  We  have 


Ps  = 


1  ■pz2Tz^-(“Z^-)-z7Tz^-(-z10Tz^ 1 
*12 


= z 12  y zm  [m  is  F-even] 
m=0 


and  in  general  we  can  prove  by  induction  that 
Fn  + 2-l 

Pn  = z'~Fn+2  ^ zm  [m  is  F-even] 
m=0 


Therefore 


Qn 


X!,m=0  ' zm  tm  *S  F'even] 

LF»  + -!_1  7m 

m=0  z 


Taking  the  limit  as  n — >oo now  gives  (6.146), because  of  (6.145). 


Exercises 

Warmups 

1 What  are  the  [4]  =11  permutations  of  {1  ,2,3,4)  that  have  exactly  two 
cycles?  (The  cyclic  forms  appear  in  (6.4);  non-cyclic  forms  like  2314  are 
desired  instead.) 

2 There  are  mn  functions  from  a set  of  n elements  into  a set  of  m elements. 
How  many  of  them  range  over  exactly  k different  function  values? 

3 Card  stackers  in  the  real  world  know  that  it’s  wise  to  allow  a bit  of  slack 
so  that  the  cards  will  not  topple  over  when  a breath  of  wind  comes  along. 
Suppose  the  center  of  gravity  of  the  top  k cards  is  required  to  be  at  least 
6 units  from  the  edge  of  the  k + 1 s t card.  (Thus,  for  example,  the  first 
card  can  overhang  the  second  by  at  most  1 — e units.)  Can  we  still  achieve 
arbitrarily  large  overhang,  if  we  have  enough  cards? 

4 Express  I / 1 + 1/3  H + l/(2n+l ) in  terms  of  harmonic  numbers. 

5 Explain  how  to  get  the  recurrence  (6.75)  from  the  definition  of  Un(x,  y) 
in  (6.74),  and  solve  the  recurrence. 
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6 An  explorer  has  left  a pair  of  baby  rabbits  on  an  island.  If  baby  rabbits 
become  adults  after  one  month,  and  if  each  pair  of  adult  rabbits  produces 
one  pair  of  baby  rabbits  every  month,  how  many  pairs  of  rabbits  are 
present  after  n months’?  (After  two  months  there  are  two  pairs,  one  of 
which  is  newborn.)  Find  a connection  between  this  problem  and  the  “bee 
tree”  in  the  text. 

7 Show  that  Cassini’s  identity  (6.103)  is  a special  case  of  (6.io8),  and  a 
special  case  of  (6.134). 

8 Use  the  Fibonacci  number  system  to  convert  65  mi/hr  into  an  approxi- 
mate number  of  km/hr. 

9 About  how  many  square  kilometers  are  in  8 square  miles? 

10  What  is  the  continued  fraction  representation  of  4>? 

Basics 

11  What  is  Hk(-1)k[k].  the  row  sum  °f  Stirling’s  cycle-number  triangle 
with  alternating  signs,  when  n is  a nonnegative  integer? 

12  Prove  that  Stirling  numbers  have  an  inversion  law  analogous  to  (5.48): 


g(n)  = 


IBh  )kf(k)  «=»  f(n)  = 

k J 


L 


-l)kg(k). 


13 


The  differential  operators  D = ^ and  4 = zD  are  mentioned  in  Chapters 
2 and  5.  We  have 


62  = z2D2  + zD, 

because  D2f(z)  = Ozf'(z)  = z^zf'(z)  = z2f,,(z)  + zf'(z),  which  is 
(z2D2  +zD)f (z.).  Similarly  it  can  be  shown  that  -93  = z3D3 + 3z2D2 + zD. 
Prove  the  general  formulas 


Qn 


zkDk , 


z,lDn  = Y_ 


k L 


( — 1 )n~kf)k 


for  all  n ^ 0.  (These  can  be  used  to  convert  between  differential  expres- 
sions of  the  forms  aicZkf*k|(z)  and  |3k$kf(z)>  as  in  (5.109).) 

14  Prove  the  power  identity  (6.37)  for  Eulerian  numbers. 

15  Prove  the  Eulerian  identity  (6.39)  by  taking  the  mth  difference  of  (6.37). 
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16  What  is  the  general  solution  of  the  double  recurrence 


An,c  = Qn  [n^O]  ; A0,k  = 0,  if  k > 0; 

An,k  = kAn_i,k  + An_iik_i,  integers  k,  n. 


when  k and  n range  over  the  set  of  all  integers? 

17  Solve  the  following  recurrences,  assuming  that  |£|  is  zero  when  n < 0 or 
k < 0: 


a 


n 

k 


n-  1 
k 


+ n 


n — 1 
k-  1 


+ [n  = k = 0] , 


b 


n 

k 


(n 


k) 


n-  1 

k 


+ 


n-  1 
k - 1 


+ [rt  — k = 0] , 


c 


n 

k 


k 


n-  1 
k 


+ k 


n-  1 

k-  1 


[n  = k = 0] , 


18  Prove  that  the  Stirling  polynomials  satisfy 


for  n,  k ^ 0. 
for  rt,  k 0. 
for  n,  k ;>  0. 


(x  + 1)an(x+l)  = (x  - n)  crn(x)  + xan_,  (x) 
19  Prove  that  the  generalized  Stirling  numbers  satisfy 


n. 


L 


x + k 
x 


X 

x — n + k 


-r 


x + k 

n + 1 


= 0 , integer  n > 0. 


rx 


£ 


x + k' 
x 


x + k 
n + 1 


0 , integer  n > 0. 


20  Find  a closed  form  for  H|,2'. 

21  Show  that  if  Hn  = an/bn,  where  a,  and  bn  are  integers,  the  denominator 
bn  is  a multiple  of  2ilgni.  Flint:  Consider  the  number  2ilgnJ  _1Hn  --  ^ . 

22  Prove  that  the  infinite  sum 


1 

k + z 


converges  for  all  complex  numbers  z,  except  when  z is  a negative  integer; 
and  show  that  it  equals  Hz  when  z is  a nonnegative  integer.  (Therefore  we 
can  use  this  formula  to  define  harmonic  numbers  Hz  when  z is  complex.) 

23  Equation  (6.81)  gives  the  coefficients  of  z/(ez  — 1),  when  expanded  in 
powers  of  z.  What  are  the  coefficients  of  z/(ez  + 1 )?  Flint:  Consider  the 
identity  (ez  + l)(ez  — 1)  = e2z  — 1. 
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24  Prove  that  the  tangent  number  T2n+i  is  a multiple  of  2”.  Hint:  Prove 
that  all  coefficients  of  T2n(x)  and  T2n+i  (x)  are  multiples  of  2n. 

25  Equation  (6.57)  proves  that  the  worm  will  eventually  reach  the  end  of 

the  rubber  band  at  some  time  N.  Therefore  there  must  come  a first 
time  ri  when  he’s  closer  to  the  end  after  n minutes  than  he  was  after 
n 1 minutes.  Show  that  n < j]\|. 

26  Use  summation  by  parts  to  evaluate  Sn  = X.k=i  Hk/k.  Hint:  Consider 

also  the  related  sum  n=,  Hk-i/k. 

2’7  Prove  the  gcd  law  (6.m)  for  Fibonacci  numbers. 

28  The  Lucas  number  Ln  is  defined  to  be  Fn+i  + Fn_i.  Thus,  according  to 
(6.109),  we  have  Fin  = FnLn.  Here  is  a table  of  the  first  few  values: 

n | 0 1 2 3 4 5 6 7 8 9 10  11  12  13 

Ln  | 2 1 3 4 7 11  18  29  47  76  123  199  322  521 

a Use  the  repertoire  method  to  show  that  the  solution  Qn  to  the  gen- 

eral recurrence 


Qo  = «;  Qi  = P;  Qn  = Qn— i -F Qn— 2 1 n>l 


can  be  expressed  in  terms  of  Fn  and  L,. 
b Find  a closed  form  for  Ln  in  terms  of  (j)  and  $. 

29  Prove  Euler’s  identity  for  continuants,  equation  (6.134). 

30  Generalize  (6.136)  to  find  an  expression  for  the  incremented  continuant 
K(xi , • • • , Xm-i , Xm  + V|,Xm+i , . . . , xn),  when  1 ^ m ^ Tl. 

Homework  exercises 

31  Find  a closed  form  for  the  coefficients  |£|  in  the  representation  of  rising 

powers  by  falling  powers: 


x 


n 


integer  n ^ 0. 


(For  example,  x4  = x-  + 1 2x-  + 36x-  + 24x-,  hence  |4|  — 36,), 
32  In  Chapter  5 we  obtained  the  formulas 


by  unfolding  the  recurrence  (£)  — ("V)  ”F  (k-i)  in  tw0  waYs-  What 
identities  appear  when  the  analogous  recurrence  = k{  nk1  } + { } 

is  unwound? 
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Ah!  Those  were 
prime  years. 


33  Table  250  gives  the  values  of  [£]  and  { 2}  What  are  closed  forms  (not 

involving  Stirling  numbers)  for  the  next  cases,  [3]  and  {3}? 

34  What  are  and  { k2),  if  the  basic  recursion  relation  (6.35)  is  assumed 
to  hold  for  all  integers  k and  n,  and  if  (k)  = 0 for  all  k < O? 

35  Prove  that,  for  every  e > 0,  there  exists  an  integer  n > 1 (depending 

on  e)  such  that  Hn  mod  1 < e. 

36  Is  it  possible  to  stack  n bricks  in  such  a way  that  the  topmost  brick  is  not 
above  any  point  of  the  bottommost  brick,  yet  a person  who  weighs  the 
same  as  100  bricks  can  balance  on  the  middle  of  the  top  brick  without 
toppling  the  pile? 

37  Express  Y_ ^ (k  mod  ru)/k(k  + 1)  in  terms  of  harmonic  numbers,  as- 

suming that  m and  n,  are  positive  integers.  What  is  the  limiting  value 
as  n — ♦ 00? 

38  Find  the  indefinite  sum  (k)  ( — l)lcH|C  6k. 

39  Express  £Ik=1  Hk  in  terms  of  ft  and  H,. 

40  Prove  that  1979  divides  the  numerator  of  1 '/k,  and  give  a 

similar  result  for  1987.  Hint:  Use  Gauss’s  trick  to  obtain  a sum  of 
fractions  whose  numerators  are  1979.  See  also  exercise  4. 

41  Evaluate  the  sum 

y ^[(n  + k)/2j^ 

in  closed  form,  when  n is  an  integer  (possibly  negative). 

42  If  S is  a set  of  integers,  let  S + 1 be  the  “shifted”  set  {x  + 1 x £ S). 

How  many  subsets  of  {1,2,.  . , n}  have  the  property  that  S U (S  +1)  = 
{1,2,...,n+1>? 

43  Prove  that  the  infinite  sum 


.1 

+ .01 
+ .002 
+ .0003 
+ .00005 
+ .000008 
+ .0000013 


converges  to  a rational  number. 
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44  Prove  the  converse  of  Cassini’s  identity  (6.106):  If  k and  m are  integers 
such  that  |m2— km—  k2  = 1,  then  there  is  an  integer  rt  such  that  k = ±Fn 
and  m = ±Fn+i. 

45  Use  the  repertoire  method  to  solve  the  general  recurrence 

Xo  = a;  Xi  — (3 ; Xu  = Xn-i  + Xn_2  + yn.  + 6 . 

46  What  are  cos  36”  and  cos  72°  ? 

47  Show  that 


and  use  this  identity  to  deduce  the  values  of  Fp  mod  p and  Fp+i  mod  p 
when  p is  prime. 

48  Prove  that  zero-valued  parameters  can  be  removed  from  continuant  poly- 
nomials by  collapsing  their  neighbors  together: 

Kn  (xi , • • • , Xm_i  i 0,  , . . . , Xn ) 

= Xn_2  (X]  , . . . , Xm_2i  Xm_i  +Xm+i , Xm+2)  • • • i Xn)  , 1 < TU.  < Tl. 

49  Find  the  continued  fraction  representation  of  the  number  ^n>1  2“ . 

50  Define  f(n)  for  all  positive  integers  n by  the  recurrence 

f ( i)  = i; 

f(2n)  = f(n); 

f(2n  + l)  = f(n)  + f(n  + 1) . 

a For  which  n is  f ( n ) even? 

b Show  that  f(n)  can  be  expressed  in  terms  of  continuants. 

Exam  problems 

51  Let  p be  a prime  number. 

a Prove  that  {£}  = [£]  = 0 (mod  p),  for  1 < k < p. 

b Prove  that  = 1 (mod  p),  for  1 ^k^p. 

c Prove  that  {2pp-2}  = [2pp  2]  = 0 (mod  p). 

d Prove  that  if  p > 3 we  have  = 0 (mod  p2).  FI  int:  Consider  pk. 

52  Let  Hn  be  written  in  lowest  terms  as  an/bn. 

a Prove  that  p\bn  pXctLn/pj.  if  P is  prime, 
b Find  all  n > 0 such  that  a„  is  divisible  by  5. 
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53  Find  a closed  form  for  (£)  1 ( — 1 )kHk,  when  0 <;  m.  ^ n.  Hint: 

Exercise  5.42  has  the  sum  without  the  factor. 

54  Let  n > 0.  The  purpose  of  this  exercise  is  to  show  that  the  denominator 
of  B2n  is  the  product  of  all  primes  p such  that  (p— l)\(2n). 

a Show  that  S,(p)  + [(p-l)\m]is  a multiple  of  p,  when  p is  prime 
and  m > 0. 

b Use  the  result  of  part  (a)  to  show  that 

O , V-  [(p-1)\(2n)] 

B2u  + 2_  2 1 = I2n  1S  an  integer. 

p prime 

Hint:  It  suffices  to  prove  that,  if  p is  any  prime,  the  denominator  of 
the  fraction  B2u  + [(p — 1 )\(2n)]  /p  is  not  divisible  by  p. 
c Prove  that  the  denominator  of  B2n  is  always  an  odd  multiple  of  6, 
and  it  is  equal  to  6 for  infinitely  many  n. 

55  Prove  (6.70)  as  a corollary  of  a more  general  identity,  by  summing 


and  differentiating  with  respect  to  x. 

56  Evaluate  Lk^m  (k)  (-1  )kku+1/(k  — m)in  closed  form  as  a function  of  the 
integers  m and  n.  (The  sum  is  over  all  integers  k except  for  the  value 
k = m.) 

57  The  “wraparound  binomial  coefficients  of  order  5”  are  defined  by 


n-  1 
k 


+ 


n - 1 

(k  — 1 ) mod  5 


n > 0, 


and  ((£))  = [k  = 0].  Let  Qn  be  the  difference  between  the  largest  and 
smallest  of  these  numbers  in  row  n: 


Qn 


max 

0$k<5 


min 

0<ck<5 


Find  and  prove  a relation  between  Qn  and  the  Fibonacci  numbers. 

58  Find  closed  forms  for  ^n>(3  F £ zn  and  ^n>0  F^z.n.  What  do  you  deduce 

about  the  quantity  F^+1  4F^  — F^  , ? 

59  Prove  that  if  m and  n are  positive  integers,  there  exists  an  integer  x such 
that  Fx  = m (mod  3n). 

60  Find  all  positive  integers  n such  that  either  Fn  + 1 or  Fn  — 1 is  a prime 
number. 
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61  Prove  the  identity 

A 1 „ F2n_! 

> = 3 , integer  n ^ 1. 

£~o ?2n 
What  is  ££=0  1 /F3.2k? 

62  Let  A,  = {j)n  + 4>-n  and  Bn  = 4)n  — 

a Find  constants  <x  and  (3  such  that  A„  = aAn_i  + (3An_2  and  Bn  = 
aBn^i  + (3Bn_2  for  all  n )>  0. 

b Express  A„  and  B^  in  terms  of  Fn  and  Ln  (see  exercise  28). 
c Prove  that  ]T£=1  1 /(F2k+i  + 1)  = Bn/An+i. 
d Find  a closed  form  for  =1V(F2k+i  - !)■ 

Bonus  problems  Bogus  problems 

63  How  many  permutations  7ti 7t2  . . - 7Tn  of  {1,2,.  . . , n}  have  exactly  k in- 
dices j such  that 

a Tl{  < 71)  for  all  j < j?  (Such  j are  called  “left-to-right  maxima!1) 

b 7tj  > j?  (Such  j are  called  “excedances ! ‘) 

64  What  is  the  denominator  of  [, tv]  ■ when  this  fraction  is  reduced  to 
lowest  terms? 

65  Prove  the  identity 

f([xi  + •■■ +xnJ)  dxi  . . . dxn  = T - 

Jo  Jo  k Or  k ,n! 

66  Show  that  (("))=  2(^),  and  find  a closed  form  for  ((2))- 

67  Find  a closed  form  for  LI  =1  k2Hn+k. 

68  Show  that  the  generalized  harmonic  numbers  of  exercise  22  have  the 
power  series  expansion 

Hz  = 2j— 

n^2 


69  Prove  that  the  generalized  factorial  of  equation  (5.83)  can  be  written 


no+ 


eyz 


z\ 


by  considering  the  limit  as  n — » 00  of  the  first  n factors  of  this  infinite 
product.  Show  that  ^(z!)  is  related  to  the  general  harmonic  numbers  of 
exercise  22.  . 
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70  Prove  that  the  tangent  function  has  the  power  series  (6.92),  and  find  the 
corresponding  series  for  z/sin  z and  ln(  (tan  z)/z). 

71  Find  a relation  between  the  numbers  Tn  (1)  and  the  coefficients  of  1 /cos  Z. 

72  What  is  2Ik(  — 1)k(k},  row  sum  °f  Euler’s  triangle  with  alternating 

signs? 

73  Prove  that,  for  all  integers  n 1, 


ZCOtz  = 


Z + k7T 
2n 


+ C 0 1 


z — k71 


and  show  that  the  limit  of  the  kth  summand  is  2z2/(z2  — k2rt2)  for  fixed  k 
as  n — > 00. 

74  Prove  the  following  relation  that  connects  Stirling  numbers,  Bernoulli 
numbers,  and  Catalan  numbers: 


1 


75  Show  that  the  four  chessboard  pieces  of  the  64  = 65  paradox  can  also  be 
reassembled  to  prove  that  64  = 63. 

76  A sequence  defined  by  the  recurrence 


A,  = X,  A2  = D,  An  = An_,  + An-2 

has  A,„  = 1000000  for  some  m.  What  positive  integers  x and  y make  m 
as  large  as  possible? 

77  The  text  describes  a way  to  change  a formula  involving  to  a formula 
that  involves  Fn  and  Fn+i  only.  Therefore  it’s  natural  to  wonder  if  two 
such  “reduced”  formulas  can  be  equal  when  they  aren’t  identical  in  form. 
Let  P(x,y)  be  a polynomial  in  x and  y with  integer  coefficients.  Find  a 
necessary  and  sufficient  condition  that  P(Fn+i  , F,)  = 0 for  all  n ;>  0. 

78  Explain  how  to  add  positive  integers,  working  entirely  in  the  Fibonacci 
number  system. 

79  Is  it  possible  that  a sequence  (A,)  satisfying  the  Fibonacci  recurrence 

A„  = An_i  + An~2  can  contain  no  prime  numbers,  if  Ao  and  Ai  are 
relatively  prime? 
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80  Show  that  continuant  polynomials  appear  in  the  matrix  product 


0 1 

1 Xi 


0 1 \ fO  1 

1 x J ' " l 1 x„ 


and  in  tl 


det 


(determinant 
■ \ xj  :1  H) 

V (II  -1  x3  ,■!,  1 


81  Generalizing  (6.146),  find  a continued  fraction  related  to  the  generating 
function  JIn>i  2^“^,  when  OC  is  any  positive  irrational  number. 

82  Let  m and  Tl  be  odd,  positive  integers.  Find  closed  forms  for 

s+  y 1 • s y - 

'2mk+n  + 'm  r2mk+n  rm 

Hint:  The  sums  in  exercise  62  are  S|3  — S|2n+3  and  S^3  — S1_2n+3- 

83  Let  oc  be  an  irrational  number  in  (0,1)  and  let  al,  ai,  CI3,  . . . be  the 

partial  quotients  in  its  continued  fraction  representation.  Show  that 
|D  (a,  n)  < 2 when  n = K(  ai , • ■ . , a,),  where  D is  the  discrepancy 
defined  in  Chapter  3. 

84  Let  Qn  be  the  largest  denominator  on  level  n of  the  Stern-Brocot  tree. 
(Thus  (Qo,  Qi , Q2,  Q3,  Q4, . . . ) = (1 ,2,  3,5, 8, . . . ) according  to  the  dia- 
gram in  Chapter  4.)  Prove  that  Qn  = Fn+2. 

85  Characterize  all  N such  that  the  Fibonacci  residues 


{F0modN,  Fi  mod  N,  F2  mod  N,  . ..} 

form  the  complete  set  {0,  1,.  . . , N — 1}.  (See  exercise  59.) 

Research  problems 

86  What  is  the  best  way  to  extend  the  definition  of  {£}  to  arbitrary  real 

values  of  n and  k? 

87  Let  Hn  be  written  in  lowest  terms  as  an/bn,  as  in  exercise  52. 

a Are  there  infinitely  many  rt  with  11  \a,? 

b Are  there  infinitely  many  Tl  with  bn  = lcm(l,2, . . . ,n)?  (Two  such 

values  are  rt  = 250  and  ri  = 1000.) 

88  Prove  that  7 and  are  irrational. 
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89  Develop  a general  theory  of  the  solutions  to  the  two-parameter  recurrence 


n 

k 


(an  + (3k + y) 


n — 1 
k 


+ (a'n-l-  |3'k4-  y') 


n-  1 
k-1 


4-  [n  = k = 0] , 


for  n,  k ^ 0, 


assuming  that  |£|  = Owhen  n < 0 or  k <0.  (Binomial  coefficients, 
Stirling  numbers,  Eulerian  numbers,  and  the  sequences  of  exercises  17 
and  31  are  special  cases.)  What  special  values  (a,  (3,y,  a',  |3',y')  yield 
“fundamental  solutions”  in  terms  of  which  the  general  solution  can  be 
expressed? 


7 


Generating  Functions 


THE  MOST  POWERFUL  WAY  to  deal  with  sequences  of  numbers,  as  far 
as  anybody  knows,  is  to  manipulate  infinite  series  that  “generate”  those  se- 
quences. We’ve  learned  a lot  of  sequences  and  we’ve  seen  a few  generating 
functions;  now  we’re  ready  to  explore  generating  functions  in  depth,  and  to 
see  how  remarkably  useful  they  are. 


7.1  DOMINO  THEORY  AND  CHANGE 


Generating  functions  are  important  enough,  and  for  many  of  us  new 
enough,  to  justify  a relaxed  approach  as  we  begin  to  look  at  them  more  closely. 
So  let’s  start  this  chapter  with  some  fun  and  games  as  we  try  to  develop  our 
intuitions  about  generating  functions.  We  will  study  two  applications  of  the 
ideas,  one  involving  dominoes  and  the  other  involving  coins. 

How  many  ways  Tn  are  there  to  completely  cover  a 2 x n rectangle  with 
2x1  dominoes?  We  assume  that  the  dominoes  are  identical  (either  because 
they’re  face  down,  or  because  someone  has  rendered  them  indistinguishable, 
say  by  painting  them  all  red);  thus  only  their  orientations-vertical  or  hori- 
zontal-matter, and  we  can  imagine  that  we’re  working  with  domino-shaped 
tiles.  For  example,  there  are  three  tilings  of  a 2 x 3 rectangle,  namely  QJ,  fH, 
and  ED;  so  T3  = 3. 

To  find  a closed  form  for  general  Tn  we  do  our  usual  first  thing,  look  at 
small  cases.  When  n = 1 there’s  obviously  just  one  tiling,  0;  and  when  n = 2 
there  are  two,  • 1 and  El. 

How  about  when  n = 0;  how  many  tilings  of  a 2 x 0 rectangle  are  there? 
It’s  not  immediately  clear  what  this  question  means,  but  we’ve  seen  similar 
situations  before:  There  is  one  permutation  of  zero  objects  (namely  the  empty 
permutation),  so  O!  = 1,  There  is  one  way  to  choose  zero  things  from  n.  things 
(namely  to  choose  nothing),  so  (J)  = 1.  There  is  one  way  to  partition  the 
empty  set  into  zero  nonempty  subsets,  but  there  are  no  such  ways  to  partition 
a nonempty  set;  so  {J}=  [n  = 0],  By  such  reasoning  we  can  conclude  that 


“Let  me  count  the 

ways.  ” 

-E.  B.  Browning 
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there’s  just  one  way  to  tile  a 2 x 0 rectangle  with  dominoes,  namely  to  use 
no  dominoes;  therefore  To  = 1.  (This  spoils  the  simple  pattern  Tn  = n that 
holds  when  n = 1,  2,  and  3;  but  that  pattern  was  probably  doomed  anyway, 
since  To  wants  to  be  1 according  to  the  logic  of  the  situation.)  A proper 
understanding  of  the  null  case  turns  out  to  be  useful  whenever  we  want  to 
solve  an  enumeration  problem. 

Let’s  look  at  one  more  small  case,  n = 4.  There  are  two  possibilities  for 
tiling  the  left  edge  of  the  rectangle-we  put  either  a vertical  domino  or  two 
horizontal  dominoes  there.  If  we  choose  a vertical  one,  the  partial  solution  is 
fl  I and  the  remaining  2x3  rectangle  can  be  covered  in  T3  ways.  If  we  choose 
two  horizontals,  the  partial  solution  HD  can  be  completed  in  T2  ways.  Thus 
T4  = T3  + T2  = 5.  (The  five  tilings  are  DID,  OH,  El,  HD,  and  EH.) 

We  now  know  the  first  five  values  of  Tn: 


n 

0 

1 

2 

3 

4 

T„ 

1 

1 

2 

3 

5 

These  look  suspiciously  like  the  Fibonacci  numbers,  and  it’s  not  hard  to  see 
why:  The  reasoning  we  used  to  establish  T4  = T3  + T2  easily  generalizes  to 

Tn  = Tn  1 + Tn  2 1 for  n )>  2.  Thus  we  have  the  same  recurrence  here  as  for 

the  Fibonacci  numbers,  except  that  the  initial  values  To  = 1 and  T;  = 1 are  a 
little  different.  But  these  initial  values  are  the  consecutive  Fibonacci  numbers 
F]  and  F2,  so  the  T’s  are  just  Fibonacci  numbers  shifted  up  one  place: 

Tn  = Fn+i  , for  n ^ 0. 

(We  consider  this  to  be  a closed  form  for  Tn,  because  the  Fibonacci  numbers 
are  important  enough  to  be  considered  “known!’  Also,  Fn  itself  has  a closed 
form  (6.123)  in  terms  of  algebraic  operations.)  Notice  that  this  equation 
confirms  the  wisdom  of  setting  To  = 1. 

But  what  does  all  this  have  to  do  with  generating  functions?  Well,  we’re 
about  to  get  to  that  -there’s  another  way  to  figure  out  what  Tn  is.  This  new 
‘lb  boldly  go  way  is  based  on  a bold  idea.  Let’s  consider  the  “sum’’  of  all  possible  2 x n 

where  no  tiling  has  tilings,  for  all  n > 0,  and  call  it  T: 

gone  before. 

T = l-FD  + ffl  + B + HD  + IH-FBH . (7.1) 

(The  first  term  T on  the  right  stands  for  the  null  tiling  of  a 2 x 0 rectangle.) 
This  sum  T represents  lots  of  information.  It’s  useful  because  it  lets  us  prove 
things  about  T as  a whole  rather  than  forcing  us  to  prove  them  (by  induction) 
about  its  individual  terms. 

The  terms  of  this  sum  stand  for  tilings,  which  are  combinatorial  objects. 
We  won’t  be  fussy  about  what’s  considered  legal  when  infinitely  many  tilings 
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are  added  together;  everything  can  be  made  rigorous,  but  our  goal  right  now 
is  to  expand  our  consciousness  beyond  conventional  algebraic  formulas. 

We’ve  added  the  patterns  together,  and  we  can  also  multiply  them-by 
juxtaposition.  For  example,  we  can  multiply  the  tilings  0 and  B to  get  the 
new  tiling  QB.  But  notice  that  multiplication  is  not  commutative;  that  is,  the 
order  of  multiplication  counts:  B is  different  from  ED. 

Using  this  notion  of  multiplication  it’s  not  hard  to  see  that  the  null 
tiling  plays  a special  role— it  is  the  multiplicative  identity.  For  instance, 

I x B = B x | = B. 

Now  we  can  use  domino  arithmetic  to  manipulate  the  infinite  sum  T: 

T = l + D + ED  + B + IID  + B + EDH 

= l + D(l  + 0 + II]  + BH — )+B(l  + D + tI]  + BH — ) 

= l + DT  + BT.  (7-2) 


Every  valid  tiling  occurs  exactly  once  in  each  right  side,  so  what  we’ve  done  is 
reasonable  even  though  we’re  ignoring  the  cautions  in  Chapter  2 about  “ab- 
solute convergence!’  The  bottom  line  of  this  equation  tells  us  that  everything 
in  T is  either  the  null  tiling,  or  is  a vertical  tile  followed  by  something  else 
in  T,  or  is  two  horizontal  tiles  followed  by  something  else  in  T. 

So  now  let’s  try  to  solve  the  equation  for  T.  Replacing  the  T on  the  left 
by  IT  and  subtracting  the  last  two  terms  on  the  right  from  both  sides  of  the 
equation,  we  get 


I have  a gut  feel- 
ing that  these 
sums  must  con- 
verge, as  long  as 
the  dominoes  are 
small  enough. 


(I-O-E)T  = I. 


(7-3) 


For  a consistency  check,  here’s  an  expanded  version; 


i n + □ + □ + o*  + e + 

— D — [D  — [ID  — CB  — 1 1 1 1 1 — 1 1 H — i h i — » * * 

— B — ED  — H 1 1 — EH  — Hill  — H H — H-j  I — * > » 


Every  term  in  the  top  row,  except  the  first,  is  cancelled  by  a term  in  either 
the  second  or  third  row,  so  our  equation  is  correct. 

So  far  it’s  been  fairly  easy  to  make  combinatorial  sense  of  the  equations 
we’ve  been  working  with.  Now,  however,  to  get  a compact  expression  for  T 
we  cross  a combinatorial  divide.  With  a leap  of  algebraic  faith  we  divide  both 
sides  of  equation  (7.3)  by  I — 0 — B to  get 


I — □ — B ' 


T = 


(7-4) 
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(Multiplication  isn’t  commutative,  so  we’re  on  the  verge  of  cheating,  by  not 
distinguishing  between  left  and  right  division.  In  our  application  it  doesn’t 
matter,  because  I commutes  with  everything.  But  let’s  not  be  picky,  unless 
our  wild  ideas  lead  to  paradoxes.) 

The  next  step  is  to  expand  this  fraction  as  a power  series,  using  the  rule 
= 1 + Z + Z2+Z3  + 

1 - z 

The  null  tiling  I,  which  is  the  multiplicative  identity  for  our  combinatorial 
arithmetic,  plays  the  part  of  1,  the  usual  multiplicative  identity;  and  □ + □ 
plays  z.  So  we  get  the  expansion 

t U E1  = l+(D  + B)  + (Q  + B)2  + (D  + B)3  + — 

= 1+  ( □ + B ) + ( □ + EB  + EED  + EEEI ) 

•/*(  PI]  T 1 1 H T Kl  T LhB  T Edll  T H H T HH I T H~H  )T  • • • . 

This  is  T,  but  the  tilings  are  arranged  in  a different  order  than  we  had  before. 
Every  tiling  appears  exactly  once  in  this  sum;  for  example,  ihhihi  appears 
in  the  expansion  of  ( 0 + 0 )7. 

We  can  get  useful  information  from  this  infinite  sum  by  compressing  it 
down,  ignoring  details  that  are  not  of  interest.  For  example,  we  can  imagine 
that  the  patterns  become  unglued  and  that  the  individual  dominoes  commute 
with  each  other;  then  a term  like  ihhihi  becomes  D4 i=i6,  because  it  contains 
four  verticals  and  six  horizontals.  Collecting  like  terms  gives  us  the  series 

T = 1+  □ + 02  + a2  + 03  + 20a2  + 04  + 302cj2  + a4  H . 

The  2D  a2  here  represents  the  two  terms  of  the  old  expansion,  [0  and  B3,  that 
have  one  vertical  and  two  horizontal  dominoes;  similarly  302  a2  represents  the 
three  terms  [[0,  [HI,  and  l=m  We’re  essentially  treating  0 and  □ as  ordinary 
(commutative)  variables. 

We  can  find  a closed  form  for  the  coefficients  in  the  commutative  version 
of  T by  using  the  binomial  theorem: 

j 7777 2T  = 1+ (□  + a2)  + (□  + a2)2  + (□  + a2)3  H 

I ■ (U  t cr J 

k^O 
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(The  last  step  replaces  k-j  by  m;  this  is  legal  because  we  have  (^)  = 0 when 
0 <C  k < j.)  We  conclude  that  is  the  number  of  ways  to  tile  a 2 x (j  +2m) 

rectangle  with  j vertical  dominoes  and  2m  horizontal  dominoes.  For  example, 
we  recently  looked  at  the  2 x 10  tiling  I i-h  i h i,  which  involves  four  verticals 
and  six  horizontals;  there  are  (4|3)  = 35  such  tilings  in  all,  so  one  of  the  terms 
in  the  commutative  version  of  T is  35D4cn6. 

We  can  suppress  even  more  detail  by  ignoring  the  orientation  of  the 
dominoes.  Suppose  we  don’t  care  about  the  horizontal/vertical  breakdown; 
we  only  want  to  know  about  the  total  number  of  2 x n tilings.  (This,  in 
fact,  is  the  number  Tn  we  started  out  trying  to  discover.)  We  can  collect 
the  necessary  information  by  simply  substituting  a single  quantity,  z,  for  Q 
and  □,  And  we  might  as  well  also  replace  I by  1,  getting 


This  is  the  generating  function  (6.117)  for  Fibonacci  numbers,  except  for  a 
missing  factor  of  Z in  the  numerator;  so  we  conclude  that  the  coefficient  of  zn 
in  T is  Fn+1 . 

The  compact  representations  l/( I— 0— B),  l/( I— □ — n2),  and  1/(1— Z—Z2) 
that  we  have  deduced  for  T are  called  generating  functions,  because  they 
generate  the  coefficients  of  interest. 

Incidentally,  our  derivation  implies  that  the  number  of  2 x n domino 
tilings  with  exactly  m pairs  of  horizontal  dominoes  is  (nmm).  (This  follows 
because  there  are  j = n 2m  vertical  dominoes,  hence  there  are 

(T)  = CD  = (nr) 

ways  to  do  the  tiling  according  to  our  formula.)  We  observed  in  Chapter  6 
that  (nmm)  is  the  number  of  Morse  code  sequences  of  length  n that  contain 
m dashes;  in  fact,  it’s  easy  to  see  that  2 x n domino  tilings  correspond  directly 

to  Morse  code  sequences.  (The  tiling  I H-l I H corresponds  to  

Thus  domino  tilings  are  closely  related  to  the  continuant  polynomials  we 
studied  in  Chapter  6.  It’s  a small  world. 

We  have  solved  the  Tn  problem  in  two  ways.  The  first  way,  guessing  the 
answer  and  proving  it  by  induction,  was  easier;  the  second  way,  using  infinite 
sums  of  domino  patterns  and  distilling  out  the  coefficients  of  interest,  was 
fancier.  But  did  we  use  the  second  method  only  because  it  was  amusing  to 
play  with  dominoes  as  if  they  were  algebraic  variables?  No;  the  real  reason 
for  introducing  the  second  way  was  that  the  infinite-sum  approach  is  a lot 
more  powerful.  The  second  method  applies  to  many  more  problems,  because, 
it  doesn’t  require  us  to  make  magic  guesses. 


Now  I’m  dis- 
oriented. 
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Let’s  generalize  up  a notch,  to  a problem  where  guesswork  will  be  beyond 
us.  How  many  ways  Un  are  there  to  tile  a 3 x n rectangle  with  dominoes? 

The  first  few  cases  of  this  problem  tell  us  a little:  The  null  tiling  gives 
U0  = 1.  There  is  no  valid  tiling  when  n = 1,  since  a 2 x 1 domino  doesn’t  fill 
a 3 x 1 rectangle,  and  since  there  isn’t  room  for  two.  The  next  case,  n = 2, 
can  easily  be  done  by  hand;  there  are  three  tilings,  0,  0,  and  §,  so  U2  = 3. 
(Come  to  think  of  it  we  already  knew  this,  because  the  previous  problem  told 
us  that  T3  = 3;  the  number  of  ways  to  tile  a 3 x 2 rectangle  is  the  same  as  the 
number  to  tile  a 2 x 3.)  When  n = 3,  as  when  rt  = 1 , there  are  no  tilings.  We 
can  convince  ourselves  of  this  either  by  making  a quick  exhaustive  search  or 
by  looking  at  the  problem  from  a higher  level:  The  area  of  a 3 x 3 rectangle  is 
odd,  so  we  can’t  possibly  tile  it  with  dominoes  whose  area  is  even.  (The  same 
argument  obviously  applies  to  any  odd  n.)  Finally,  when  n = 4 there  seem 
to  be  about  a dozen  tilings;  it’s  difficult  to  be  sure  about  the  exact  number 
without  spending  a lot  of  time  to  guarantee  that  the  list  is  complete. 

So  let’s  try  the  infinite-sum  approach  that  worked  last  time: 

u = | + 0 + 5]  + g + Hg  + ® + H + 0 + 5B  + -".  (7.7) 

Every  non-null  tiling  begins  with  either  g or  |f  or  f=j;  but  unfortunately  the 
first  two  of  these  three  possibilities  don’t  simply  factor  out  and  leave  us  with 
U again.  The  sum  of  all  terms  in  U that  begin  with  g can,  however,  be  written 
as  gV,  where 

v = D + ig  + ^ + [g  + g + --. 

is  the  sum  of  all  domino  tilings  of  a mutilated  3 x n rectangle  that  has  its 
lower  left  comer  missing.  Similarly,  the  terms  of  U that  begin  with  ff  can  be 
written  ff  A,  where 

A = D + [ffl  + [S  + [@  + S+  "- 

consists  of  all  rectangular  tilings  lacking  their  upper  left  corner.  The  series  A 
is  a mirror  image  of  V.  These  factorizations  allow  us  to  write 

u = I + 0V  + FA  + iU. 

And  we  can  factor  V and  A as  well,  because  such  tilings  can  begin  in  only 
two  ways: 


V = ou  + %v, 

A = □U  + A, 
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Now  we  have  three  equations  in  three  unknowns  (U,  V,  and  A).  We  can  solve 
them  by  first  solving  for  V and  A in  terms  of  U,  then  plugging  the  results 
into  the  equation  for  U: 


V = (I  ar^U,  A = (I-eST’oU; 
u=  + M-ar^u  + Fd- gt’du  + m 

And  the  final  equation  can  be  solved  for  U,  giving  the  compact  formula 


- - Fd-ffJ-’o  - g 


(7-8) 


This  expression  defines  the  infinite  sum  U,  just  as  (7.4)  defines  T. 

The  next  step  is  to  go  commutative.  Everything  simplifies  beautifully 
when  we  detach  all  the  dominoes  and  use  only  powers  of  0 and  □: 

I 

1 — [|2a(l  ,-,3)-  1 _ □2a(l  — a3)-1  ^T 

1-q3 

= ( l - a3)2  -2D2a 

(l-o3)-1 

= l -2[]2a(1  - a3)-2 

I 202  □ 404  a2  806a3 

~ T^3  + (1  - a3)3  (1-a3)5  + (1-  a3)7+'" 


I /earned  in  another 
class  about  ‘‘regular 
expressions.”  If  I’m 
not  mistaken,  we 
can  write 
u = 

+ Pff*D  + i)* 

in  the  language  of 
regular  expressions; 
so  there  must  be 
some  connection 
between  regular 
expressions  and  gen- 
erating functions. 


L 


2kD2kqk 

(1  _03)2k+l 


2kD2kc=ik+3m 


(This  derivation  deserves  careful  scrutiny.  The  last  step  uses  the  formula 
(1  _ w)~2k“1  = (mm2k)wm>  identity  (5.56).)  Let’s  take  a good  look  at 
the  bottom  line  to  see  what  it  tells  us.  First,  it  says  that  every  3 x n tiling 
uses  an  even  number  of  vertical  dominoes.  Moreover,  if  there  are  2k  verticals, 
there  must  be  at  least  k horizontals,  and  the  total  number  of  horizontals  must 
be  k + 3m  for  some  m ^ 0.  Finally,  the  number  of  possible  tilings  with  2k 
verticals  and  k + 3m  horizontals  is  exactly  (m^2k)  2k. 

We  now  are  able  to  analyze  the  3x4  tilings  that  left  us  doubtful  when  we 
began  looking  at  the  3 x n problem.  When  n = 4 the  total  area  is  12,  so  we 
need  six  dominoes  altogether.  There  are  2k  verticals  and  k + 3m  horizontals, 
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for  some  k and  m;  hence  2k  + k + 3m  = 6.  In  other  words,  k -j-  m = 2. 
If  we  use  no  verticals,  then  k = 0 and  m = 2;  the  number  of  possibilities 
is  (22°)2°  = 1.  (This  accounts  for  the  tiling  |j^=],)  If  we  use  two  verticals, 
then  k = 1 and  m = 1;  there  are  ('I2)!1  = 6 such  tilings.  And  if  we  use 
four  verticals,  then  k = 2 and  m = 0;  there  are  (°q4)22  = 4 such  tilings, 
making  a total  of  U4  = 11.  In  general  if  n is  even,  this  reasoning  shows  that 
k + TTl  = jTt,  hence  (m+2k)  = (n/2-tc)  anc^  total  number  of  3 x n tilings  is 


As  before,  we  can  also  substitute  z for  both  0 and  □ getting  a gen- 
erating function  that  doesn’t  discriminate  between  dominoes  of  particular 
persuasions.  The  result  is 

1 1 — z2 

U = 1 -z3(1  _z3)-l  -zJ(l  -z2)-1  -z2  = 1 -4z3~+z6  ' (7- 10) 

If  we  expand  this  quotient  into  a power  series,  we  get 
U = 1 + U2  z‘!  + U4  z6  + U6  z9  + Ug  7} 2 + ■ • • , 


Ah  yes,  I remember 
when  we  had  half- 
dollars. 


a generating  function  for  the  numbers  U,.  (There’s  a curious  mismatch  be- 
tween subscripts  and  exponents  in  this  formula,  but  it  is  easily  explained.  The 
coefficient  of  z9,  for  example,  is  Ug,  which  counts  the  tilings  of  a 3 x 6 rectan- 
gle. This  is  what  we  want,  because  every  such  tiling  contains  nine  dominoes.) 

We  could  proceed  to  analyze  (7.10)  and  get  a closed  form  for  the  coeffi- 
cients, but  it’s  better  to  save  that  for  later  in  the  chapter  after  we’ve  gotten 
more  experience.  So  let’s  divest  ourselves  of  dominoes  for  the  moment  and 
proceed  to  the  next  advertised  problem,  “change!’ 

How  many  ways  are  there  to  pay  50  cents?  We  assume  that  the  payment 
must  be  made  with  pennies  ®,  nickels  ©,  dimes  ©,  quarters  and  half- 
dollars  (so),  George  Polya  [239]  popularized  this  problem  by  showing  that  it 
can  be  solved  with  generating  functions  in  an  instructive  way. 

Let’s  set  up  infinite  sums  that  represent  all  possible  ways  to  give  change, 
just  as  we  tackled  the  domino  problems  by  working  with  infinite  sums  that 
represent  all  possible  domino  patterns.  It’s  simplest  to  start  by  working  with 
fewer  varieties  of  coins,  so  let’s  suppose  first  that  we  have  nothing  but  pennies. 
The  sum  of  all  ways  to  leave  some  number  of  pennies  (but  just  pennies)  in 
change  can  be  written 


P = /+  © + ©©  + ©0®  + ©0©®  + 
= /+©  + ®2  + ®3  + ®4  + ---  , 
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The  first  term  stands  for  the  way  to  leave  no  pennies,  the  second  term  stands 
for  one  penny,  then  two  pennies,  three  pennies,  and  so  on.  Now  if  we’re 
allowed  to  use  both  pennies  and  nickels,  the  sum  of  all  possible  ways  is 

N = P + © P + ©® P + ®@® P + ®®@® P H 

= (/+©  + ®2  + ®3  + ®4  H ) P , 


since  each  payment  has  a certain  number  of  nickels  chosen  from  the  first 
factor  and  a certain  number  of  pennies  chosen  from  P.  (Notice  that  N is 
not  the  sum  / + © + © + (©  + ®)2  + (®  + ®)3  + . . . , because  such  a 
sum  includes  many  types  of  payment  more  than  once.  For  example,  the  term 
(®  + ©)2  = 0©  + ®©  + ©®  + ©®  treats  ®@  and  ©Q  as  if  they  were 
different,  but  we  want  to  list  each  set  of  coins  only  once  without  respect  to 
order.) 

Similarly,  if  dimes  are  permitted  as  well,  we  get  the  infinite  sum 

D = (^  + © + ©2  + ©3  + ©4  H ) N , 

which  includes  terms  like  @3®3®5  = ©®©©©©®Q®®Q  when  it  is 
expanded  in  full.  Each  of  these  terms  is  a different  way  to  make  change. 

Adding  quarters  and  then  half-dollars  to  the  realm  of  possibilities  gives  Coins  of  the  realm. 

Q = ( ?£  + 0 + ©2  + 03  + ©4  -| ) D ; 

C = + © + ©2  + @3  + @4  -I )Q. 


Our  problem  is  to  find  the  number  of  terms  in  C worth  exactly  50^. 

A simple  trick  solves  this  problem  nicely:  We  can  replace  0 by  z,  ® 
by  z5,  © by  z10,  © by  z25,  and  @ by  z*®.  Then  each  term  is  replaced  by  zn, 
where  n is  the  monetary  value  of  the  original  term.  For  example,  the  term 
@®®@®  becomes  z50+10+5+5+' = z71.  The  four  ways  of  paying  13  cents, 
namely  @©3,  ©®8,  ®2®3,  and  ®13,  each  reduce  to  z13;  hence  the  coefficient 
of  z13  will  be  4 after  the  z-substitutions  are  made. 

Let  Pn,  Nni  Dn,  Qn,  and  Cn  be  the  numbers  of  ways  to  pay  n cents 
when  we’re  allowed  to  use  coins  that  are  worth  at  most  1 , 5,  10,  25,  and  50 
cents,  respectively.  Our  analysis  tells  us  that  these  are  the  coefficients  of  zn 
in  the  respective  power  series 


3 4 

Z + Z + 


P — 1 + z z‘ 

N = (1  +z5  + z10 
D = (1  +z10  + z20  + z30  + z40  + " 
Q = (1  + z25  + z50  + z75  + z100  T ■ • 
C = (1  +z50  + z100  + z150  + z200  + 


zi;i  + z20 


)N, 

)D, 

• • ) Q ■ 
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How  many  pennies 
are  there,  really? 

If  n is  greater 
than,  say,  1010, 

/ bet  that  P„  = 0 
in  the  "real  world,” 


Obviously  Pn  = 1 for  all  n j>  0.  And  a little  thought  proves  that  we  have 
Nn  = |n/5J  + 1 : To  make  rt  cents  out  of  pennies  and  nickels,  we  must  choose 
either  0 or  1 or  ...  or  |n/5J  nickels,  alter  which  there’s  only  one  way  to  supply 
the  requisite  number  of  pennies.  Thus  Pn  and  Nn  are  simple;  but  the  values 
of  Dn,  Qn,  and  Cn  are  increasingly  more  complicated. 

One  way  to  deal  with  these  formulas  is  to  realize  that  1 + zm  + z2m  + • • ■ 
is  just  1/(1—  zm).  Thus  we  can  write 


P = 1/(1  -z), 

N = P/(l-r5), 

D = N/(1  z10)  , 
Q = D/(l  z25)  , 

C = Q/(l-z50) 


Multiplying  by  the  denominators,  we  have 


(l-z)P  = 1 , 

(1  - z5)  N = p, 

(1  — z10)  D = N , 

(1  -z25)Q  = D , 

(1  - z50)  C = Q. 

Now  we  can  equate  coefficients  of  zn  in  these  equations,  getting  recurrence 
relations  from  which  the  desired  coefficients  can  quickly  be  computed: 

Pn  = Pn_,  + [n  = 0]  , 

Nn  = Nn_5  + Pn , 

Dn  = Dn_io  + Nn  I 
Qn  = Qn-25  -t  Dn, 

Cn  = On— 50  + Qn. 

For  example,  the  coefficient  of  zn  in  D = (1  z25)Q  is  equal  to  Qn  — QIV-25! 

so  we  must  have  Qn  — Qn-25  = Dn,  as  claimed. 

We  could  unfold  these  recurrences  and  find,  for  example,  that  Qn  = 

Dn  + Dn-25  + Dn-5o  + Dn_75H , stopping  when  the  subscripts  get  negative. 

But  the  non-iterated  form  is  convenient  because  each  coefficient  is  computed 
with  just  one  addition,  as  in  Pascal’s  triangle. 

Let’s  use  the  recurrences  to  find  C50.  First,  C50  = Co  + Qsoi  so  we  want 
to  know  Qso-  Then  Q50  = Q25  + D50,  and  Q25  = Qo  + D25;  so  we  also  want 
to  know  D50  and  D25-  These  Dn  depend  in  turn  on  D40,  D30,  D20,  D15, 
Dio,  D5 , and  on  N50,  N45,  . . . , N5.  A simple  calculation  therefore  suffices  to 
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determine  all  the  necessary  coefficients: 


n 0 5 10  15  20  25  30  35  40  45  50 


Nn 

D„ 

Qn 


i i 

1 2 34  5 
12 

1 


Cn  1 


1 


4 


1 

6 

6 


1111 

7 8 9 

9 1216 

13 


1 1 
10 

25 


1 

11 

36 

49 

50 


The  final  value  in  the  table  gives  us  our  answer,  C50 : There  are  exactly  50  ways 

to  leave  a 50-cent  tip.  (Not  counting  the 

How  about  a closed  form  for  C,?  Multiplying  the  equations  together  option  of  charging 

F M & the  tip  to  a credit 

gives  us  the  compact  expression  cartl , 


C = 


1 I 1 I 

1 — z 1 — z5  1 — z ^ 1 — z25  1 — z50  ’ 


(7-n) 


but  it’s  not  obvious  how  to  get  from  here  to  the  coefficient  of  zn.  Fortunately 
there  is  a way;  we’ll  return  to  this  problem  later  in  the  chapter. 

More  elegant  formulas  arise  if  we  consider  the  problem  of  giving  change 
when  we  live  in  a land  that  mints  coins  of  every  positive  integer  denomination 
(©,  ©,©,  . . . ) instead  of  just  the  five  we  allowed  before.  The  corresponding 
generating  function  is  an  infinite  product  of  fractions, 


(1  - z)(l  - z2)(l  - z3) . . . ’ 

and  the  coefficient  of  zn  when  these  factors  are  fully  multiplied  out  is  called 
p(n),  the  number  of  partitions  of  n.  A partition  of  n is  a representation  of  n 
as  a sum  of  positive  integers,  disregarding  order.  For  example,  there  are  seven 
different  partitions  of  5,  namely 

5 = 4+1  =3+2  = 3 + 1 + 1 =2+2+1  =2+l  + l + 1=  l + l+1+1+l; 

hence  p(5)  = 7.  (Also  p(2)  =:  2,  p(3)  = 3,  p(4)  = 5,  and  p(6)  =11;  it  begins 
to  look  as  if  p(n)  is  always  a prime  number.  But  p(  7)  = 15,  spoiling  the 
pattern.)  There  is  no  closed  form  for  p(n),  but  the  theory  of  partitions  is  a 
fascinating  branch  of  mathematics  in  which  many  remarkable  discoveries  have 
been  made.  For  example,  Ramanujan  proved  that  p(5n  + 4)  = 0 (mod  5), 
p(7n  + 5)  = 0 (mod  7),  and  p(l  In  + 6)  =0  (mod  1 1),  by  making  ingenious 
transformations  of  generating  functions  (see  Andrews  [11,  Chapter  10]). 


If  physicists  can  get 
away  with  viewing 
light  sometimes  as 
a wave  and  some- 
times as  a particle, 
mathematicians 
should  be  able  to 
view  generating 
functions  in  two 
different  ways, 
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7.2  BASIC  MANEUVERS 

Now  let’s  look  more  closely  at  some  of  the  techniques  that  make 
power  series  powerful. 

First  a few  words  about  terminology  and  notation.  Our  generic  generat- 
ing function  has  the  form 

G(z)  = g0  + giz+g2z2  + --- = ^ gnzn , (7.12) 

n^O 

and  we  say  that  G(z),  or  G for  short,  is  the  generating  function  for  the  se- 
quence  (go,  9i , 92,  • ■ &C  toe  also  call  (gn>.  The  coefficient  gn  of  zn 
in  G(z)  is  sometimes  denoted  [zn]  G(z). 

The  sum  in  (7-12)  runs  over  all  Tl  ^ 0,  but  we  often  find  it  more  con- 
venient to  extend  the  sum  over  all  integers  n.  We  can  do  this  by  simply 
regarding  g 1=  g_2  = ■ • • = 0.  In  such  cases  we  might  still  talk  about  the 
sequence  (go,  gi , g2,  • . . },  as  if  the  gu’s  didn’t  exist  for  negative  n. 

Two  kinds  of  “closed  forms”  come  up  when  we  work  with  generating 
functions.  We  might  have  a closed  form  for  G(z),  expressed  in  terms  of  Z]  or 
we  might  have  a closed  form  for  gn,  expressed  in  terms  of  n.  For  example,  the 
generating  function  for  Fibonacci  numbers  has  the  closed  form  z/(  1 z z2); 
the  Fibonacci  numbers  themselves  have  the  closed  form  (4>n  — cpn)/\/5.  The 
context  will  explain  what  kind  of  closed  form  is  meant. 

Now  a few  words  about  perspective.  The  generating  function  G(z)  ap- 
pears to  be  two  different  entities,  depending  on  how  we  view  it.  Sometimes 
it  is  a function  of  a complex  variable  z,  satisfying  all  the  standard  properties 
proved  in  calculus  books.  And  sometimes  it  is  simply  a formal  power  series, 
with  z acting  as  a placeholder.  In  the  previous  section,  for  example,  we  used 
the  second  interpretation;  we  saw  several  examples  in  which  z was  substi- 
tuted for  some  feature  of  a combinatorial  object  in  a “sum”  of  such  objects. 
The  coefficient  of  zn  was  then  the  number  of  combinatorial  objects  having  n 
occurrences  of  that  feature. 

When  we  view  G(z)  as  a function  of  a complex  variable,  its  convergence 
becomes  an  issue.  We  said  in  Chapter  2 that  the  infinite  series  gnzn 

converges  (absolutely)  if  and  only  if  there’s  a bounding  constant  A such  that 
the  finite  sums  J'<n<N  |gnZn|  never  exceed  A,  for  any  N.  Therefore  it’s  easy 
to  see  that  if  ^n>0  gn.zn  converges  for  some  value  z = Zq,  it  also  converges  for 
all  z with  \z\  < |zq|.  Furthermore,  we  must  have  limn_>00  |gnZo  = 0;  hence,  in 
the  notation  of  Chapter  9,  gn=  O(|1/zoln)  if  there  is  convergence  at  zq.  And 
conversely  if  gn  = 0(Mn),  the  series  ^Tn>0  gnzn  converges  for  all  |z|  < 1/M. 
These  are  the  basic  facts  about  convergence  of  power  series. 

But  for  our  purposes  convergence  is  usually  a red  herring,  unless  we’re 
trying  to  study  the  asymptotic  behavior  of  the  coefficients.  Nearly  every 


318  GENERATING  FUNCTIONS 


operation  we  perform  on  generating  functions  can  be  justified  rigorously  as 
an  operation  on  formal  power  series,  and  such  operations  are  legal  even  when 
the  series  don’t  converge.  (The  relevant  theory  can  be  found,  for  example,  in 
Bell  [19],  Niven  [225],  and  Henrici  [151,  Chapter  1].] 

Furthermore,  even  if  we  throw  all  caution  to  the  winds  and  derive  formu- 
las without  any  rigorous  justification,  we  generally  can  take  the  results  of  our 
derivation  and  prove  them  by  induction.  For  example,  the  generating  func- 
tion for  the  Fibonacci  numbers  converges  only  when  jz|  < 1/4>  r;  0.618,  but 
we  didn’t  need  to  know  that  when  we  proved  the  formula  pn  = ((jr^  $n)/\/5. 

The  latter  formula,  once  discovered,  can  be  verified  directly,  if  we  don’t  trust 
the  theory  of  formal  power  series.  Therefore  we’ll  ignore  questions  of  conver- 
gence in  this  chapter;  it’s  more  a hindrance  than  a help. 

So  much  for  perspective.  Next  we  look  at  our  main  tools  for  reshaping 
generating  functions-adding,  shifting,  changing  variables,  differentiating, 
integrating,  and  multiplying.  In  what  follows  we  assume  that,  unless  stated 
otherwise,  F(z)  and  G(z)  are  the  generating  functions  for  the  sequences 
and  (gn).  We  also  assume  that  the  fn’s  and  ga’s  are  zero  for  negative  n,  since 
this  saves  us  some  bickering  with  the  limits  of  summation. 

It’s  pretty  obvious  what  happens  when  we  add  constant  multiples  of 
F and  G together: 

cxF(z)  + |3G(z)  = a^fnzn  + |3  Y gnzn 
n n 

= ^(afn  + |3gn)zn.  (7.13) 

n 

This  gives  us  the  generating  function  for  the  sequence  (af,  -|-  |3gn). 

Shifting  a generating  function  isn’t  much  harder.  To  shift  G(z)  right  by 
m places,  that  is,  to  form  the  generating  function  for  the  sequence  (0,.  • • ,0, 
go , g 1 , • - - ) = (gn-m)  with  m leading  O’s,  we  simply  multiply  by  zm: 

ZmG(z)  = Y gn  Zn+m  = Y.  9n- mzn  , integer  m 0.  (7.14) 

n n 

This  is  the  operation  we  used  (twice),  along  with  addition,  to  deduce  the 
equation  (1  % — z2)F(z)  = Z °n  our  way  to  finding  a closed  form  for  the 

Fibonacci  numbers  in  Chapter  6. 

And  to  shift  G(z)  left  nr  places-that  is,  to  form  the  generating  function 
for  the  sequence  {gm,  gm+i,  gm+2,  - ■ • ) = (gn+m)  with  the  first  m elements 
discarded-  we  subtract  off  the  first  nr  terms  and  then  divide  by  zm: 


G (z)-go-g,z-.  ■ ■ -gm-izm  1 
zm 


= Y 9nZn~m 

n^m 


^ 9n+mZn.  (7-15) 

n^O 


Even  if  we  remove 
the  tags  from  our 
mat  tresses. 


(We  can’t  extend  this  last  sum  over  all  n unless  go  = 


gm— 1 — 0-) 
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Replacing  the  z by  a constant  multiple  is  another  of  our  tricks: 

G(cz)  = ^gn(cz)n=  cngnzn  i (7.16) 

n n 

this  yields  the  generating  function  for  the  sequence  (cngn)-  The  special  case 
c = -1  is  particularly  useful. 

Often  we  want  to  bring  down  a factor  of  Tl  into  the  coefficient.  Differen- 
tiation is  what  lets  us  ‘do  that: 

G’(z)  = gi  + 2g2z  + 3g3z2  + ---  = Jjn  + 1 )gn+1  zn  . (7.17) 

Tl 

Shifting  this  right  one  place  gives  us  a form  that’s  sometimes  more  useful, 

zG'(z)  = ^ngnzT1  (7- 18) 

n 

This  is  the  generating  function  for  the  sequence  (ng,).  Repeated  differentia- 
tion would  allow  us  to  multiply  gn  by  any  desired  polynomial  in  n. 
Integration,  the  inverse  operation,  lets  us  divide  the  terms  by  n: 

G ft)  dt  = g0z+^giz2+  ^g2Z3+---  = ^-gn-iz\  (7.19) 

1 1 * n$l  n 

(Notice  that  the  constant  term  is  zero.)  If  we  want  the  generating  function 
for  (gn/n)  instead  of  (gn_i/n),  we  should  first  shift  left  one  place,  replacing 
G(t)  by  (G(t)  — g0 ) /t  in  the  integral. 

Finally,  here’s  how  we  multiply  generating  functions  together: 

F(z)G(z)  = (f0  + fiz  + f2Z2  + — )(go  + giz+g2Z2H ) 

(fogo)  + (fogi  +Ggo)z  + uog2  +Ggi  d-^golz2  + ••• 

= H(Hfk9n-k)^n  • (7-20) 

n k 

As  we  observed  in  Chapter  5,  this  gives  the  generating  function  for  the  se- 
quence (h.n),the  convolution  of  (fn)  and  (gn).The  sum  Hn  = Y k fkgn-k  can 
also  be  written  Hn  = ZLo  fk9n-k,  because  fk  = 0 when  k < 0 and  gn-k  = 0 
when  k > n.  Multiplication/convolution  is  a little  more  complicated  than 
the  other  operations,  but  it’s  very  useful-so  useful  that  we  will  spend  all  of 
Section  7.5  below  looking  at  examples  of  it. 

Multiplication  has  several  special  cases  that  are  worth  considering  as 
operations  in  themselves.  We’ve  already  seen  one  of  these:  When  F(z)  = zm 
we  get  the  shifting  operation  (7.14).  In  that  case  the  sum  h.n  becomes  the 
single  term  gn-mi  because  all  f^’s  are  0 except  for  fm  — 1. 
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Table  320  Generating  function  manipulations. 


aF(z)  + (3G(z)  = Jjafn  + |3gn)zn 
u 

zmG(z)  = 22  9n-mZn,  integer  m ^ 0 

TL 

GM-g.-nu---!..,!-’  = £ gn+m 2n . integer  m ? 0 

2 n^O 

G(cz)  = 22  cngn  zn 

n 

G'(z)  = 22 (n+  l)gn+i  zn 

n 

zG'(z)  = 22,ngnzn 

n 

G(t)  dt  = 22  ~9n-l  Z71 

n£1  U 

F(z)G(z)  . L(L  fkgn-k'jz71 

n x k ' 

t3^g(z)  - 


n xk^n 


Another  useful  special  case  arises  when  F(z)  is  the  familiar  function 
1/(1  — z)  = 1 + z -F  Z2  + • • • ; then  all  f^’s  (for  k ^ 0)  are  1 and  we  have 
the  important  formula 

— g(z)  = 22(22gn-kV  = 2l(l29k)zn-  (7-") 

2 n ' n 'vk$;n  ' 

Multiplying  a generating  function  by  1 /(  1-z)  gives  us  the  generating  function 
for  the  cumulative  sums  of  the  original  sequence. 

Table  320  summarizes  the  operations  we’ve  discussed  so  far.  To  use 
all  these  manipulations  effectively  it  helps  to  have  a healthy  repertoire  of 
generating  functions  in  stock.  Table  321  lists  the  simplest  ones;  we  can  use 
those  to  get  started  and  to  solve  quite  a few  problems. 

Each  of  the  generating  functions  in  Table  321  is  important  enough  to 
be  memorized.  Many  of  them  are  special  cases  of  the  others,  and  many  of 
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Hint:  If  the  se- 
quence consists 
of  binomial  coeffi- 
cients, its  generat- 
ing function  usually 
involves  a binomial, 
1 ±z. 


Table  321  Simple  sequences  and  their  generating  functions. 


sequence 

generating  function 

closed  form 

0,0, 0,0, 0,0,.,) 

y [ti=o]  zn 

(0,.,  .,0, 1,0,0,...) 

y [rt  = m]  2n 

Z — n^O 

zm 

(1,1, 1,1,1, !,...) 

y zn 

1 

1 - Z 

( 1,  - 1, 1,  ■ 1, 1,  - 1 .... ) 

y (-Dnzn 

Z—nZ0 

1 + z 

(1,0, 1,0, 1,0,.,  , ) 

y [2\n]  zn 

Z—n^O 

1 — z 2 

(1,0,. ..,0,1,0, 0,1,0,  ) 

Y [m\n]  zn 

Z—n^C 

1 

1 — zm 

(1,2, 3, 4, 5, 6,...) 

Y (n+  1)2n 
Z—  n^O 

1 

(l  - z)2 

(1,2,4,8,16,32,...) 

y 2nzn 

Z_TI>0 

1 

1 -2z 

(1, 4, 6, 4, 1 , 0,0, ... ) 

Z 4 *n 

Z—n^O  {n  ) 

(l  + z)4 

<1-c,  ©,©,•••) 

y (c)zn 

Z—n^O  \nj 

(1  + z)C 

(l,c  ,(cr),(Cf)---) 

y_  (c+n_V 

z—n^o  \ n J 

1 

(1  — z)c 

(1,C,C2,C3,...) 

y cnz^ 

Z—n^O 

1 

1 - cz 

(rr:'),rmA.("3)-  ) 

y fT m+n)  zn 

Z—n^o  {my 

1 

It  -z)m+1 

(0,1, i, 

y -zn 

z — n^1  n 

lnrr 

<o,i,-i,i,-i,...) 

y (-1)n+1 

Z—nZ  1 n 

ln(1  + z) 

/i  i 1 1 1 1 \ 

\ 1 ) 2 ’ 6’  24  ' 120  ’ ' ' ' / 

y V 

n! 

ez 

them  can  be  derived  quickly  from  the  others  by  using  the  basic  operations  of 
Table  320;  therefore  the  memory  work  isn’t  very  hard. 

For  example,  let’s  consider  the  sequence  (1,2, 3, 4,  . ),  whose  generating 
function  l/(  1 z)2  is  often  useful.  This  generating  function  appears  near  the 
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middle  of  Table  321,  and  it’s  also  the  special  case  m = 1 of  ( 1,  (mn((1),  ("//“), 

),  which  appears  further  down;  it’s  also  the  special  case  c _ 2 of 
the  closely  related  sequence  (1,  c,  (ct')  , (c  /') , . }.  We  can  derive  it  from  the 
generating  function  for  (1  , 1 , 1 , 1 , . . ) by  taking  cumulative  sums  as  in  (7.21); 
that  is,  by  dividing  1 /(1-z)  by  ( 1 — z).  Or  we  can  derive  it  from  ( 1 , 1 , 1 , 1 , , ) 

by  differentiation,  using  (7.17). 

The  sequence  (1  , 0,  1 , 0,  . ) is  another  one  whose  generating  function  can 
be  obtained  in  many  ways.  We  can  obviously  derive  the  formula  ^ z2n  = 
l/(  1 z2 ) by  substituting  z2  for  z in  the  identity  zn  = 1 /(  1 - z);  we  can 
also  apply  cumulative  summation  to  the  sequence  ( 1 , ■ 1 , 1 , - 1 , , , , ) , whose 
generating  function  is  1/(1  ■+  z),  getting  1/(1  +z)(1  - z)  = 1/(1  — z2 ).  And 
there’s  also  a third  way,  which  is  based  on  a general  method  for  extracting 
the  even-numbered  terms  (gc , 0,  gi,  0,  g4,  Q, . . . ) of  any  given  sequence;  If  we 
add  G(-z)  to  G(+z)  we  get 

G(z)+  G(— z)  = 9n 0 +(-l)n)zn  = 2^  gn[n  even]zn  ; 
n n 


therefore 

G(z)+  G(-z) 
2 


= Y.  92"  z 

n 


In 


The  odd-numbered  terms  can  be  extracted  in  a similar  way, 


G(z)  — G(-z) 


— Y 02n+l Z 


2n+l 


(7.22) 


(7-23) 


In  the  special  case  where  gn  - 1 and  G(z)=  1/(  1 -z),  the  generating  function 
for  (1,0, 1,0,...)  is  KG(2)  + G(-z))  = l(TJT  + TTl)  =1^' 

Let’s  try  this  extraction  trick  on  the  generating  function  for  Fibonacci 
numbers.  We  know  that  Ln  FnZn  = z/(  1 Z 


,2V 


hence 


= \( 


2 ( 1 — z — z2  1 + z — z2 

z + z2  — z3  — z + z2  + z3 
(1-z2)2  - z2 


= 1 - 3z2  + z4 


This  generates  the  sequence  (Fo,  0,  F2 ,0,  F4 , . . . );  hence  the  sequence  of  alter- 

nate F’s,  (Fq,  Fz , F4,  Fg, ... ) = (0, 1 ,3,8, . . . ),  has  a simple  generating  function: 


n 


z 

1 — 3z  + z2 


OK,  OK,  l’mcon- 
Wncec/  a I ready 


(7-24) 
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7.3  SOLVING  RECURRENCES 

Now  let’s  focus  our  attention  on  one  of  the  most  important  uses  of 
generating  functions:  the  solution  of  recurrence  relations. 

Given  a sequence  (gn)  that  satisfies  a given  recurrence,  we  seek  a closed 
form  for  gn  in  terms  of  n.  A solution  to  this  problem  via  generating  functions 
proceeds  in  four  steps  that  are  almost  mechanical  enough  to  be  programmed 
on  a computer: 

1 Write  down  a single  equation  that  expresses  gn  in  terms  of  other  elements 
of  the  sequence.  This  equation  should  be  valid  for  all  integers  n,  assuming 
that  g_i  — g_2  = •••  = 0. 

2 Multiply  both  sides  of  the  equation  by  zn  and  sum  over  all  rt.  This  gives, 
on  the  left,  the  sum  ^tlgnzn,  which  is  the  generating  function  G (z).  The 
right-hand  side  should  be  manipulated  so  that  it  becomes  some  other 
expression  involving  G (z). 

3 Solve  the  resulting  equation,  getting  a closed  form  for  G (z). 

4 Expand  G(z)  into  a power  series  and  read  off  the  coefficient  of  zn;  this  is 
a closed  form  for  gn. 

This  method  works  because  the  single  function  G(z)  represents  the  entire 
sequence  (gn)  in  such  a way  that  many  manipulations  are  possible. 

Example  1:  Fibonacci  numbers  revisited. 

For  example,  let’s  rerun  the  derivation  of  Fibonacci  numbers  from  Chap- 
ter 6.  In  that  chapter  we  were  feeling  our  way,  learning  a new  method;  now 
we  can  be  more  systematic.  The  given  recurrence  is 

gO  = 0;  si  = 1 ; 

gn  = gn-i  + 9n-2 , for  n ^ 2 

We  will  find  a closed  form  for  gn  by  using  the  four  steps  above. 

Step  1 tells  us  to  write  the  recurrence  as  a “single  equation’’  for  gn.  We 
could  say 

fO,  ifn^O; 

gn  = < 1 , if  n = 1; 

Un-i  +gn-2.  if  n > 1; 

but  this  is  cheating.  Step  1 really  asks  for  a formula  that  doesn’t  involve  a 
case-by-case  construction.  The  single  equation 

9n  = 9n-1  4"  9n-2 

works  for  n 1>  2,  and  it  also  holds  when  n ^ 0 (because  we  have  go  = 0 
and  gnegative  = 0)-  But  when  n = 1 we  get  1 on  the  left  and  0 on  the  right. 
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Fortunately  the  problem  is  easy  to  fix,  since  we  can  add  [n  = 1 ] to  the  right; 
this  adds  1 when  n = 1,  and  it  makes  no  change  when  n ^ 1.  So,  we  have 

9n  = 9n-i  + gn-2  + [n  = 1 ] ; 

this  is  the  equation  called  fo:r  in  Step  1. 

Step  2 now  asks  us  to  transform  the  equation  for  (gn)  into  an  equation 
for  G(z)  = guzn.  The  task  is  not  difficult: 

gn-i  zn  + gn_2  zn  + £jn  = 1]zn 

n n n. 

Y_9nZn+]  +^gnzn+1  + z 
n ti 

zG(z)  + z2G(z)  + z. 


G(z)  = Y_  9nZn  = 
n 


Step  3 is  also  simple  in  this  case;  we  have 


G(z) 


_ z 

1 — z — z2  ’ 


which  of  course  comes  as  no  surprise. 

Step  4 is  the  clincher.  We  carried  it  out  in  Chapter  6 by  having  a sudden 
flash  of  inspiration;  let’s  go  more  slowly  now,  so  that  we  can  get  through 
Step  4 safely  later,  when  we  meet  problems  that  are  more  difficult.  What  is 


the  coefficient  of  zn  when  z/(  1 z z2)  is  expanded  in  a power  series?  More 
generally,  if  we  are  given  any  rational  function 


R(z) 


PUL 
QU)  ’ 


where  P and  Q are  polynomials,  what  is  the  coefficient  [zn]  R(z)? 

There’s  one  kind  of  rational  function  whose  coefficients  are  particularly 
nice,  namely 


a 

(1  ~pz)m+1 


I 

n^O 


m + n 


m 


ap  z 


(7-25) 


(The  case  p = 1 appears  in  Table  321,  and  we  can  get  the  general  formula 
shown  here  by  substituting  pz  for  z.)  A finite  sum  of  functions  like  (7.25), 


S(z) 


Ql 


Q2 


(1  - P!z)m'+1  + (1  - P2Z)m2  + 1 


+ • • • + 


Ql 

(1  - P;Z)mi  + 1 ’ 


(7.26) 
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also  has  nice  coefficients, 


r n,  c,  . /mi  + n\  /m.2  + n\ 

[z  ] S(z)  = ai  ( P1+Q2  )P2 

TTll  / y TTl2 


+ 


. . . +Q^ 


/ mi  + n 


Pi1 


(7-27) 


We  will  show  that  every  rational  function  R(z)  such  that  R(0)  ^ oc  can  be 
expressed  in  the  form 


R(z)  = S(z)  t T(z) , (7.28) 

where  S(z)  has  the  form  (7.26)  and  T(z)  is  a polynomial.  Therefore  there  is  a 
closed  form  for  the  coefficients  [zn]  R(z).  Finding  S(z)  and  T(z)  is  equivalent 
to  finding  the  “partial  fraction  expansion”  of  R(z). 

Notice  that  S(z)  = 00  when  z has  the  values  1/pi,  . . . , 1/Pi-  Therefore 
the  numbers  py  that  we  need  to  find,  if  we’re  going  to  succeed  in  expressing 
R(z)  in  the  desired  form  S(z)  + T(z),  must  be  the  reciprocals  of  the  numbers 
Cty  where  Q((Xk)  = 0.  (Recall  that  R(z)  = P (z)/Q (z) , where  P and  Q are 
polynomials;  we  have  R(z)  = 00  only  if  Q(z)  = 0.) 

Suppose  Q(z)  has  the  form 

Q(z)  =qo  + qiZ+ 1-  qmzm  , where  q0^  0 andqm  ^ 0 . 


The  “reflected”  polynomial 

Qr(z)  = qoz.m+  qizm_1  + •■•+  qm 


has  an  important  relation  to  Q (z): 

QR(z)  = qo(z  pi  ) ...  (z  - pm) 

<=*>  Q(z)  = q0(1  — pi 2) ...  (1  -pmz) 

Thus,  the  roots  of  QR  are  the  reciprocals  of  the  roots  of  Q,  and  vice  versa. 
We  can  therefore  find  the  numbers  py  we  seek  by  factoring  the  reflected  poly- 
nomial QR(z). 

For  example,  in  the  Fibonacci  case  we  have 
Q(z)  = 1 -z-z2;  Qr(z)  = z2  - z - 1 . 


The  roots  of  QR  ca.n  be  found  by  setting  (a,  b,  c)  = (1,  -1,  -1)  in  the  quad- 
ratic formula  (-b  ± y/b2  — 4ac)/2a;  we  find  that  they  are 


4>  = 


1 +x/5 


a n 


d $ = 


1 -y/5 


Therefore  QR(z)  = (z  — 4?) (z  — $)  and  Q(z)  =(  1 — <J>z)(1  — $z). 
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Once  we’ve  found  the  p’s,  we  can  proceed  to  find  the  partial  fraction 
expansion.  It’s  simplest  if  all  the  roots  are  distinct,  so  let’s  consider  that 
special  case  first.  We  might  as  well  state  and  prove  the  general  result  formally: 

Rational  Expansion  Theorem  for  Distinct  Roots. 

If  R(z)  = P(z)/Q(z),  where  Q(z)  = q0(1  p,z)  ...  (1  pxz]  and  the 
numbers  (pi, . . . , Pi)  are  distinct,  and  if  P(z)  is  a polynomial  of  degree  less 
than  l,  then 


[zn]  R(z)  = aip^H Faip[\ 


where  a.k  = 


-PkP(VPk) 

Q'd/Pk)’ 


(7-29) 


Proof:  Let  Cl],  . , . , cq  be  the  stated  constants.  Formula  (7.29)  holds  if  R(z)  = 
P(z)/Q(z)  is  equal  to 


S(zJ  = 


Ql 

1 -Piz 


+ •••  + 


ai 

1 ^ piz  ' 


And  we  can  prove  that  R(z)  = S(z)  by  showing  that  the  function  T(z)  = 
R(z)  — S(z)  is  not  infinite  as  z — ) 1 /pk-  For  this  will  show  that  the  rational 
function  T(z)  is  never  infinite;  hence  T(z)  must  be  a polynomial.  We  also  can 
show  that  T(z)  -tOasz  — ) co;  hence  T(z)  must  be  zero. 

Let  ctk  = 1 /pk-  To  prove  that  limz_,ak  T(z)  / 00,  it  suffices  to  show  that 
limz_,ak  (z  — £X]c)T(z)  =0,  because  T(z)  is  a rational  function  of  z.  Thus  we 
want  to  show  that 


lim  (z  - otk)R(z)  = lim  (z  - ak)S(z)  . 

z— *ai<  z— >ak 

The  right-hand  limit  equals  limz_,ak  Qk(z—  0Ck)/(1  ~ Pkz)  — —ak/pk, because 
(1  - pkz)  = -pk(z—  ak)  and  (z—  ak)/(1  - Pjz)  — > 0 for  j ^ k.  The  left-hand 
limit  is 


lim  (z—  ak) 

z— ►aic 


P(z) 

Q(Z) 


= P(aic)  lim 

z— -ak 


z-  ak 

Q(z) 


P(cXk) 
Q'(cXk)  ’ 


by  L’Hospital’s  rule.  Thus  the  theorem  is  proved. 

Returning  to  the  Fibonacci  example,  we  have  P(z)  = z and  Q(z)  = 
1 — Z — Z2  = (1  — $z);  hence  Q’(z)  = -1  — 2z,  and 

-pP(Vp)  -1  = 

Q'(1/P)  -1  -2/p  P + 2‘ 

According  to  (7.29),  the  coefficient  of  4>n  in  [zn]  R(z)  is  therefore  cji/fcj)  + 2) 

1 /\/5;  the  coefficient  of  $n  j.s  $/($  + 2)  = —l/y/5.  So  the  theorem  tells  us 
that  Fn  = (4>n  - $n)/\/5,  as  in  (6.123). 


Impress  your  par- 
ents by  leaving  the 
book  open  at  this 
page. 
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When  Q(z)  has  repeated  roots,  the  calculations  become  more  difficult, 
but  we  can  beef  up  the  proof  of  the  theorem  and  prove  the  following  more 
general  result: 

General  Expansion  Theorem  for  Rational  Generating  Functions. 

If  R(z)  = P(z)/Q(z),  where  Q(z)  = q0(1  - Piz)d|  ■ . .(1  - piz)dl  and  the 
numbers  (pi , . . , pt)  are  distinct,  and  if  P(z)  is  a polynomial  of  degree  less 
than  di+  • • • +di, then 


[zn]  R(z)  = fifn)p7+  •••+  fifnjpj1  for  all  n ^ 0,  (7-3°) 

where  each  fk(n)  is  a polynomial  of  degree  dk  — 1 with  leading  coefficient 


(-Pk)dkF[1/Pk)dk 

Q,dk)(l/Pk) 

Ptl/Pk) 

(dk  — 1 )!  q0  rij^kH  — Pi/Pk)dl  ' 


(7-3i) 


This  can  be  proved  by  induction  on  max(d] , . . . , d[),  using  the  fact  that 
cn(di-l)!  at(di  -1)! 

(1  - P!Z)d'  ' ' ' (1  - piz)di 

is  a rational  function  whose  denominator  polynomial  is  not  divisible  by 
(1  — pkz)dk  for  any  k. 

Example  2:  A more-or-less  random  recurrence. 

Now  that  we’ve  seen  some  general  methods,  we’re  ready  to  tackle  new 
problems.  Let’s  try  to  find  a closed  form  for  the  recurrence 


go  = gi  = 1 

9n  = gn-i  +2gn_2  + (~1)r 


for  n > 2. 


(7-32) 


It’s  always  a good  idea  to  make  a table  of  small  cases  first,  and  the  recurrence 
lets  us  do  that  easily: 


n 

0 

1 

2 

3 

4 

5 

6 

7 

(-nn 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

9n 

1 

1 

4 

5 

14 

23 

52 

97 

No  closed  form  is  evident,  and  this  sequence  isn’t  even  listed  in  Sloane’s 
Handbook  [270];  so  we  need  to  go  through  the  four-step  process  if  we  want 
to  discover  the  solution. 
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Step  1 is  easy,  since  we  merely  need  to  insert  fudge  factors  to  fix  things 
when  n.  < 2:  The  equation 

9n  = 9n.-i  + 2gn_2  + ( — 1)u[ti^0]  + [n  = 1 ] 
holds  for  all  integers  n.  Now  we  can  carry  out  Step  2: 

N.B.:  The  upper 
index  on  £n=1zn 
is  not  missing! 

(Incidentally,  we  could  also  have  used  ( J j instead  of  ( - 1 ) " [n  i>  0],  thereby 
getting  Y_n  ( ~a1)zn  = ( 1 + z) by  the  binomial  theorem.)  Step  3 is  elementary 
algebra,  which  yields 

cf  , 1 +z(1+2;|  1 +z  + z2 

lZJ  = (1  + z)(1  — z — 2z2 ) = (1  -2z)(1  + z)2' 

And  that  leaves  us  with  Step  4. 

The  squared  factor  in  the  denominator  is  a bit  troublesome,  since  we 
know  that  repeated  roots  are  more  complicated  than  distinct  roots;  but  there 
it  is.  We  have  two  roots,  pi  = 2 and  P2  = ■ 1 ; the  general  expansion  theorem 
(7.30)  tells  us  that 

gn  = Qi  2n  + (a2n  + c)(-1)n 

for  some  constant  c,  where 

1+1/2+ 1/4  7 1-1+1  1 

Ql  = (1+ 1/2)2  = 9;  Q2  = 1-2/M)  = 3 ' 

(The  second  formula  for  c+  in  (7.31)  is  easier  to  use  than  the  first  one  when 
the  denominator  has  nice  factors.  We  simply  substitute  z = 1 /Pk  everywhere 
in  R ( z ) , except  in  the  factor  where  this  gives  zero,  and  divide  by  (dk  — 1 )!;  this 
gives  the  coefficient  of  ndk”p£.)  Plugging  in  n = 0 tells  us  that  the  value  of 
the  remaining  constant  c had  better  be  | ; hence  our  answer  is 

gn  = |2n+  (}n  + |)(-1)n.  (7  33) 

It  doesn’t  hurt  to  check  the  cases  n = 1 and  2,  just  to  be  sure  that  we  didn’t 
foul  up.  Maybe  we  should  even  try  n = 3,  since  this  formula  looks  weird.  But 
it’s  correct,  all  right. 

Could  we  have  discovered  (7.33)  by  guesswork?  Perhaps  after  tabulating 
a few  more  values  we  may  have  observed  that  gn+i  « 2gn  when  ri  is  large. 


G(z)  = X 9nZn  = 2-9n-lZn  +2^_Cln-2Zn  +2_(-l)nZn+  2_Zn 
n ri  n n^O  u=1 

= zG(z)  + 2z2G(z)  + — (Z. 

1 + Z 
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And  with  chutzpah  and  luck  we  might  even  have  been  able  to  smoke  out 
the  constant  2 . But  it  sure  is  simpler  and  more  reliable  to  have  generating 
functions  as  a tool. 

Example  3:  Mutually  recursive  sequences. 

Sometimes  we  have  two  or  more  recurrences  that  depend  on  each  other. 
Then  we  can  form  generating  functions  for  both  of  them,  and  solve  both  by 
a simple  extension  of  our  four-step  method. 

For  example,  let’s  return  to  the  problem  of  3 x n domino  tilings  that  we 
explored  earlier  this  chapter.  If  we  want  to  know  only  the  total  number  of 
ways,  lin,  to  cover  a 3 x n rectangle  with  dominoes,  without  breaking  this 
number  down  into  vertical  dominoes  versus  horizontal  dominoes,  we  needn’t 
go  into  as  much  detail  as  we  did  before.  We  can  merely  set  up  the  recurrences 

Uo  = 1 . Ui  =o;  V0  = O,  V]  — 1 ] 

Un  — 2Vn_i  + Un_2  , Vn  = Un-_i  + Vn_2i  f°r  n ^ 2. 

Here  Vn  is  the  number  of  ways  to  cover  a 3 x n rectangle-minus-corner,  using 
(3n  — l)/2  dominoes.  These  recurrences  are  easy  to  discover,  if  we  consider 
the  possible  domino  configurations  at  the  rectangle’s  left  edge,  as  before.  Here 
are  the  values  of  Un  and  Vn  for  small  n: 


n 

0 

1 

2 

3 

4 5 

6 

7 

\ 

Un 

1 

0 

3 

0 

11 

0 41 

_ 5" 

(7-34) 

Vn 

0 

1 

0 

4 

0 

15  0 

56 

Let’s  find  closed  forms,  in  four  steps.  First  (Step  1),  we  have 
Un  = 2Vn_i  + Un_2  + [n  = 0]  , vn  = Un_i  + Vn_2  , 
for  all  n.  Hence  (Step  2), 

U(z)  = 2zV{z)  + z2U(z)  + 1 , V(z)  = zll(z)  + z2V{z) 


Now  (Step  3)  we  must  solve  two  equations  in  two  unknowns;  but  these  are 
easy,  since  the  second  equation  yields  V(z)=  zU(z)/(1  — z2);  we  find 


U ( z ) 


1 — 4z2  + z4  ’ 


V(z) 


z 

1 - 4z2  + z4 


(7-35) 


(We  had  this  formula  for  U(z)  in  (7.10),  but  with  z3  instead  of  z2.  In  that 
derivation,  n was  the  number  of  dominoes;  now  it’s  the  width  of  the  rectangle.) 

The  denominator  1 — 4z2  + z4  is  a function  of  z2;  this  is  what  makes 
U2n+f  5=  0 and  V2n  = 0,  as  they  should  be.  We  can  take  advantage  of  this 
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nice  property  of  7}  by  retaining  7}  when  we  factor  the  denominator:  We  need 
not  take  1 — 4z2  + z4  all  the  way  to  a product  of  four  factors  (1  — p^z),  since 
two  factors  of  the  form  (1  p^z2)  will  be  enough  to  tell  us  the  coefficients.  In 

other  words  if  we  consider  the  generating  function 

W(z)  = - = Wo  + W]  z + W2  z2  H , (7-36) 

I — 4z  + zL 

we  will  have  V(z)  = zW(z2)  and  U(z)  = (1  — z2)W(z2);  hence  V2n+i  = Wn 
and  U2n  = Wn  — Wn._  1 . We  save  time  and  energy  by  working  with  the  simpler 
function  W(z). 

The  factors  of  1 — 4z  + Z2are  (z  — 2 — \/3 ) and  (z-2+&),  and  they  can 
also  be  written  (1  — (2+\/3  )z)  and  (1  — (2— \/3)z)  because  this  polynomial 
is  its  own  reflection.  Thus  it  turns  out  that  we  have 


V2n+1  = wn 


3+2V3 


U2n  = Wn  -Wn-i 


(2  + \/3 


)Tl  + 3-4^(  2-V3T; 


(2  + \/3)n  + ^-^(2-v/3)n 


6 6 
(2  + y/3)n  + (2- V3)n 
3 - a 3 + \/3 


(7-37) 


This  is  the  desired  closed  form  for  the  number  of  3 x n domino  tilings. 

Incidentally,  we  can  simplify  the  formula  for  U2n  by  realizing  that  the 
second  term  always  lies  between  0 and  1 , The  number  U2n  is  an  integer,  so 
we  have 


U2n 


'(2  + x/3)n~ 
3 — Vi 


for  n ^ 0. 


(7-38) 


In  fact,  the  other  term  (2  — \/3 ) n/  ( 3 + Vi  ) is  extremely  small  when  rt  is 
large,  because  2 — \/3  ~ 0.268.  This  needs  to  be  taken  into  account  if  we 
try  to  use  formula  (7.38)  in  numerical  calculations.  For  example,  a fairly 
expensive  name-brand  hand  calculator  comes  up  with  413403.0005  when  asked 
to  compute  (2  + \/3 ) 1 °/ ( 3 — \/3 ) . This  is  correct  to  nine  significant  figures; 
but  the  true  value  is  slightly  less  than  413403,  not  slightly  greater.  Therefore 
it  would  be  a mistake  to  take  the  ceiling  of  413403.0005;  the  correct  answer, 
U20  = 4 134  03,  is  obtained  by  rounding  to  the  nearest  integer.  Ceilings  can 
be  hazardous. 


I’ve  known  slippery 
floors  too. 


Example  4:  A closed  form  for  change. 

When  we  left  the  problem  of  making  change,  we  had  just  calculated  the 
number  of  ways  to  pay  50^.  Let’s  try  now  to  count  the  number  of  ways  there 
are  to  change  a dollar,  or  a million  dollars-still  using  only  pennies,  nickels, 
dimes,  quarters,  and  halves. 
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The  generating  function  derived  earlier  is 

1 1 I I I 

C(z)  = -1—  z -I—  z5-  4—  z,a  -1—  z25'  -1—  z50- 1 

this  is  a rational  function  of  Z with  a denominator  of  degree  91.  Therefore 
we  can  decompose  the  denominator  into  91  factors  and  come  up  with  a 91- 
term  “closed  form’'  for  Cn,  the  number  of  ways  to  give  n cents  in  change. 
But  that’s  too  horrible  to  contemplate.  Can’t  we  do  better  than  the  general 
method  suggests,  in  this  particular  case? 

One  ray  of  hope  suggests  itself  immediately,  when  we  notice  that  the 
denominator  is  almost  a function  of  z5.  The  trick  we  just  used  to  simplify  the 
calculations  by  noting  that  1 — 4z 2 + z4  is  a function  of  z}  can  be  applied  to 
C(z),  if  we  replace  1/(1—  z)  by  (1  + z + z2  + z3  + z4)/(1  — z5): 


C(z) 

C(z) 


1 Izf-z2 1 z3  + z4  i 1 I i 

~1  — Z5  1 —2,5  1 — Z10  1 — Z25  1 — z5C 

(1  +z  + z2  + z3  + z4)C(z5) , 

11  1 1 1 

1 - z 1 — z 1 — z2  1 - z5  1 - z10  ' 


Now  we’re  also 
getting  compressed 
reasoning. 


The  compressed  function  C(z)  has  a denominator  whose  degree  is  only  19, 
so  it’s  much  more  tractable  than  the  original.  This  new  expression  for  C(z) 
shows  us,  incidentally,  that  Csn  = Csn+i  = C5n+2  = Csn+3  = Csn+4;  and 
indeed,  this  set  of  equations  is  obvious  in  retrospect:  The  number  of  ways  to 
leave  a 53^  tip  is  the  same  as  the  number  of  ways  to  leave  a 50^  tip,  because 
the  number  of  pennies  is  predetermined  modulo  5. 

But  C(z)  still  doesn’t  have  a really  simple  closed  form  based  on  the  roots 
of  the  denominator.  The  easiest  way  to  compute  its  coefficients  of  C(z)  is 
probably  to  recognize  that  each  of  the  denominator  factors  is  a divisor  of 
1 — z10.  Hence  we  can  write 


V) 


A(z) 

(i  -zl<yF’ 


where  A(z)  = Ao  + Ai z H hA3iZ31. 


(7-39) 


The  actual  value  of  A(z),  for  the  curious,  is 


(1  + z-f + z9)2(l  +z2  + hz8)(1  +z5) 

= 1 + 2z  + 4z2  + 6z3  + 9z4  + 13z5  + 18z6  + 24z7 

+ 31z8  + 39z9  + 45z10  + 52Z11  + 57z12  + 63z13  + 67z14  + 69z15 
+ 69z16  + 67z17  + 63z18  + 57z19  + 52z20  + 45z21  + 39z22  +31z23 
+ 24z24  T 18z25  + 13z26  + 9z27  + 6z28  + 4z29  + 2z30  + z31  . 
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Finally,  since  1 /( 1 — z10)5  = X!k>0  (k44)zl°k>  we  can  determine  the  coefficient 
of  Cn  = [zn]  C(z)  as  follows,  when  rt  = 1 Oq  + r and  0 ^ r < 10: 

ClOq+r  = Aj  (k44)  [10q  + r = 10k  + j] 

j,k 

= Ar(qt4)  + Ar+10(qJ3)  + Ar+20(^2)  + Ar+3o(qJ1)  . (7-40) 


This  gives  ten  cases,  one  for  each  value  of  r;  but  it’s  a pretty  good  closed 
form,  compared  with  alternatives  that  involve  powers  of  complex  numbers. 

For  example,  we  can  use  this  expression  to  deduce  the  value  of  C-,oq  = 
Cioq.  Then  r = 0 and  we  have 


C 


50q  = 


q+4 

4 


+ 45 


<T) 


■52 


q+2 

4 


( 4 + 1 
4 


4 


The  number  of  ways  to  change  50jz!  is  +45(^)  = 50;  the  number  of  ways 
to  change  $1  is  (f)  + 45(3)  + 52(q)  = 292;  and  the  number  of  ways  to  change 
$1,000,000  is 


/ 2000004\  ( 2000003\  _/2000002\ 

( 4 M 4 )+52(  4 


)+\  4 J 


^ 2000001 ^ 
4 


66666793333412666685000001. 


Example  5:  A divergent  series. 

Now  let’s  try  to  get  a closed  form  for  the  numbers  gn  defined  by 

go  = 1 ; 

9n  = ngn-i  , for  n > 0. 

After  staring  at  this  for  a Sew  nanoseconds  we  realize  that  gn  is  just  n!;  in 
fact,  the  method  of  summation  factors  described  in  Chapter  2 suggests  this 
answer  immediately.  But  let’s  try  to  solve  the  recurrence  with  generating 
functions,  just  to  see  what  happens.  (A  powerful  technique  should  be  able  to 
handle  easy  recurrences  like  this,  as  well  as  others  that  have  answers  we  can’t 
guess  so  easily.) 

The  equation 

9n  = tign- 1 + fn  = 0] 

holds  for  all  n,  and  it  leads  to 

G(z)  = ^gnzn=  ngn_i  zn  + ^_  zn  . 
n n n=0 

To  complete  Step  2,  we  want  to  express  Y „ n8-  1 zn  in  terms  °f  G(z),  and  the 
basic  maneuvers  in  Table  320  suggest  that  the  derivative  G’(z)  = rtgnzn  1 


Nowadays  peo- 
ple are  talking 
femtoseconds. 
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"This  will  be  quick.” 
That’s  what  the 
doctor  said  just 
before  he  stuck  me 
with  that  needle. 
Come  to  think  of  it, 
" hypergeometric" 
sounds  a lot  like 
"hypodermic." 


is  somehow  involved.  So  we  steer  toward  that  kind  of  sum: 
G(z)  = 1+£>  + 1)gnzn+1 

n 

= 1 + n9'  £n+1  + H 9n  zn+1 

n 1 

= 1 + z2G'(z)  + zG(z) . 


Let’s  check  this  equation,  using  the  values  of  gn  for  small  n.  Since 

G = 1 + z + 2z2  + 6z3  + 24z4  + . , . , 

G'  = 1 +4z  + 18z2  + 96z3  H — , 

we  have 

z2G'  = z2  + 4z3  + 1 8z4  + 96z5  H , 

zG  = z + z2  + 2z3  + 6z4  + 24z5  + • • ■ , 

1 = 1. 

These  three  lines  add  up  to  G,  so  we’re  fine  so  far.  Incidentally,  we  often  find 
it  convenient  to  write  ‘G’  instead  of  ‘G(z)‘;  the  extra  ‘(z)’  just  clutters  up  the 
formula  when  we  aren’t  changing  z. 

Step  3 is  next,  and  it’s  different  from  what  we’ve  done  before  because  we 

have  a differential  equation  to  solve.  But  this  is  a differential  equation  that 
we  can  handle  with  the  hypergeometric  series  techniques  of  Section  5.6;  those 

techniques  aren’t  too  bad.  (Readers  who  are  unfamiliar  with  hypergeometrics 
needn’t  worrv-  this  will  be  quick.) 

First  we  must  get  rid  of  the  constant  ‘T,  so  we  take  the  derivative  of 
both  sides: 


G’  = (z2G'+  zG  + 1)’  = (2zG'  + z2G")  + (G  + zG') 

= z2G"  + 3zG'  + G. 

The  theory  in  Chapter  5 tells  us  to  rewrite  this  using  the  4 operator,  and  we 
know  from  exercise  6.13  that 

£G  = zG',  fl2G  = z2G"  + zG' . 

Therefore  the  desired  form  of  the  differential  equation  is 

OG  = zfl2G  + 2zflG  + zG  = z(0  + 1)2G. 

According  to  (5.109),  the  solution  with  g0  = 1 is  the  hypergeometric  series 

F(l,l;;z). 
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Step  3 was  more  than  we  bargained  for;  but  now  that  we  know  what  the 
function  G is,  Step  4 is  easy-the  hypergeometric  definition  (5.76)  gives  us 
the  power  series  expansion: 


G(r)  = F 


Z 

n^O 


1 n -\nzn 


n 


22  n!  z" 
n^O 


We’ve  confirmed  the  closed  form  we  knew  all  along,  gn  = n!. 

Notice  that  the  technique  gave  the  right  answer  even  though  G(z)  di- 
verges for  all  nonzero  Z.  The  sequence  n!  grows  so  fast,  the  terms  In!  zn 
approach  oo  as  n — ) oo,  unless  Z = 0.  This  shows  that  formal  power  series 
can  be  manipulated  algebraically  without  worrying  about  convergence. 


Example  6:  A recurrence  that  goes  ail  the  way  back. 

Let’s  close  this  section  by  applying  generating  functions  to  a problem  in 
graph  theory.  A fun  of  order  n is  a graph  on  the  vertices  {0,  1,  . . . , n}  with 
2n  — 1 edges  defined  as  follows:  Vertex  0 is  connected  by  an  edge  to  each  of 
the  other  n vertices,  and  vertex  k is  connected  by  an  edge  to  vertex  k + 1 , for 
1 <C  k < n.  Here,  for  example,  is  the  fan  of  order  4,  which  has  five  vertices 
and  seven  edges. 


The  problem  of  interest:  How  many  spanning  trees  fn  are  in  such  a graph? 
A spanning  tree  is  a subgraph  containing  all  the  vertices,  and  containing 
enough  edges  to  make  the  subgraph  connected  yet  not  so  many  that  it  has 
a cycle,  ft  turns  out  that  every  spanning  tree  of  a graph  on  n + 1 vertices 
has  exactly  n edges.  With  fewer  than  n edges  the  subgraph  wouldn’t  be 
connected,  and  with  more  than  n it  would  have  a cycle;  graph  theory  books 
prove  this. 

There  are  (2nr2)  ways  to  choose  n edges  from  among  the  2n  1 present 
in  a fan  of  order  n,  but  these  choices  don’t  always  yield  a spanning  tree.  For 
instance  the  subgraph 


has  four  edges  but  is  not  a spanning  tree;  it  has  a cycle  from  0 to  4 to  3 to  0, 
and  it  has  no  connection  between  {1  ,2}  and  the  other  vertices.  We  want  to 
count  how  many  of  the  (2nu  ')  choices  actually  do  yield  spanning  trees. 
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Let’s  look  at  some  small  cases.  It’s  pretty  easy  to  enumerate  the  spanning 
trees  for  n = 1,2,  and  3: 

/■  /^  /\  A 


_ /I  A A „ 

f,  = 1 h = 3 f3  = 8 

(We  need  not  show  the  labels  on  the  vertices,  if  we  always  draw  vertex  0 at 
the  left.)  What  about  the  case  n = O?  At  first  it  seems  reasonable  to  set 
fo  = 1;  but  we’ll  take  f0  = 0,  because  the  existence  of  a fan  of  order  0 (which 
should  have  2n  — 1 = -1  edges)  is  dubious. 

Our  four-step  procedure  tells  us  to  find  a recurrence  for  fn  that  holds  for 
all  n.  We  can  get  a recurrence  by  observing  how  the  topmost  vertex  (vertex  n) 
is  connected  to  the  rest  of  the  spanning  tree.  If  it’s  not  connected  to  vertex  0, 
it  must  be  connected  to  vertex  n 1 , since  it  must  be  connected  to  the  rest  of 
the  graph.  In  this  case,  any  of  the  fn_  | spanning  trees  for  the  remaining  fan 
(on  the  vertices  0 through  n — 1)  will  complete  a spanning  tree  for  the  whole 
graph.  Otherwise  vertex  n is  connected  to  0,  and  there’s  some  number  k <C  n 
such  that  vertices  n,  n-  1 , . . , k are  connected  directly  but  the  edge  between 
k and  k — 1 is  not  in  the  subtree.  Then  there  can’t  be  any  edges  between 
0 and  {n  — 1 , . . . , k),  or  there  would  be  a cycle.  If  k = 1(  the  spanning  tree 
is  therefore  determined  completely.  And  if  k > 1,  any  of  the  fk__]  ways  to 
produce  a spanning  tree  on  { 0,  1 , . . . , k — 1 } will  yield  a spanning  tree  on  the 
whole  graph.  For  example,  here’s  what  this  analysis  produces  when  n = 4: 

k = 4 k = 3 k-2  k = 1 


The  general  equation,  valid  for  p,  ^ 1,  is 

fn  = fn-1  4"  fn-1  + + fn_3  + f-|  + 1 . 

(It  almost  seems  as  though  the  ‘1’  on  the  end  is  f0  and  we  should  have  chosen 
fo  = 1 ; but  we  will  doggedly  stick  with  our  choice.)  A few  changes  suffice  to 
make  the  equation  valid  for  all  integers  n: 

fn  = fn-1  + y~  fk  T [tl  > 0]  . (7.41) 

k<n 
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This  is  a recurrence  that  “goes  all  the  way  back”  from  fn_i  through  all  pre- 
vious values,  so  it’s  different  from  the  other  recurrences  we’ve  seen  so  far 
in  this  chapter.  We  used  a special  method  to  get  rid  of  a similar  right-side 
sum  in  Chapter  2,  when  we  solved  the  quicksort  recurrence  (2.12);  namely, 
we  subtracted  one  instance  of  the  recurrence  from  another  (fn+i  — fn)-  This 
trick  would  get  rid  of  the  now,  as  it  did  then;  but  we’ll  see  that  generating 
functions  allow  us  to  work  directly  with  such  sums.  (And  it’s  a good  thing 
that  they  do,  because  we  will  be  seeing  much  more  complicated  recurrences 
before  long.) 

Step  1 is  finished;  Step  2 is  where  we  need  to  do  a new  thing: 


F(Z)  = ^nZU 

n 


£fn-izn  + ^ficZn[k<u]  + Y [n>0]zu 
n k,n  n 

zF(z)  + ^fkzk^_[n>k]zn“k  + 
k n 


zF(z)  + F(z)  Y_  Zm 

m>0 


Z 

1 — Z 


zF(z)  + F(z)  -T-—  + 

1 — z 1 - z 


The  key  trick  here  was  to  change  zn  to  zkzn_k;  this  made  it  possible  to  express 
the  value  of  the  double  sum  in  terms  of  F(z),  as  required  in  Step  2. 

Now  Step  3 is  simple  algebra,  and  we  find 


F(z) 


z 

1 — 3z  + z2' 


Those  of  us  with  a zest  for  memorization  will  recognize  this  as  the  generating 
function  (7.24)  for  the  even-numbered  Fibonacci  numbers.  So,  we  needn’t  go 
through  Step  4;  we  have  found  a somewhat  surprising  answer  to  the  spans- 
of-fans  problem: 

fn  = F2n  , forn^O.  (7-42) 


7.4  SPECIAL  GENERATING  FUNCTIONS 

Step  4 of  the  four-step  procedure  becomes  much  easier  if  we  know 
the  coefficients  of  lots  of  different  power  series.  The  expansions  in  Table  321 
are  quite  useful,  as  far  as  they  go,  but  many  other  types  of  closed  forms  are 
possible.  Therefore  we  ought  to  supplement  that  table  with  another  one, 
which  lists  power  series  that  correspond  to  the  “special  numbers”  considered 
in  Chapter  6. 
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Table  337  Generating  functions  for  special  numbers. 


( 1 _z)m+l  'n  | . z = H 


mJl  n r 


T = 




1 ( T'm—  1 -fFm+l  )z  + ( — 1 )mz2 


ml  klzk 


k J (1  -z)k+i 


d -z)(l  — 2z) . . . (1 


= L 


z =zz 


+ 1).  ■.  (z  + m-  1)  = Y 


V 1 — z/  m n! 


ln(l  +z) 


= r-{  ra 

“ n!  I m-r 

n J>0  v 


n n 


I - e~z 


= L 


zn  f ml  / /m-1 


ru  I m-n  / V n 


e^+wz  5= 


w — r 
.niti/  n! 


p,w(e2-1  ] 


nl  mz 
ml  n! 


Ln  z 1 

o m w FT 


1 - w V-  /n\  mz" 

e<' w ~ 1 >2  - w ~ \m / ,v  n ! 
m,a^0  ' ' 
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Table  337  is  the  database  we  need.  The  identities  in  this  table  are  not 
difficult  to  prove,  so  we  needn’t  dwell  on  them;  this  table  is  primarily  for 
reference  when  we  meet  a new  problem.  But  there’s  a nice  proof  of  the  first 
formula,  (7.43),  that  deserves  mention:  We  start  with  the  identity 


(1 


z 


n 


and  differentiate  it  with  respect  to  x.  On  the  left,  (1  — z)  x 1 is  equal  to 
e(x+i)ln(i/(l—  zl))  so  contributes  a factor  of  ln(l/(  1 — z)).  On  the  right, 

the  numerator  of  (x^n)  is  (x  +n)  . . . (x  + 1 ),  and  d/dx  splits  this  into  n terms 
whose  sum  is  equivalent  to  multiplying  (x+n)  by 


Replacing  x by  nr  gives  (7.43).  Notice  that  Hx+n  Hx  is  meaningful  even 
when  x is  not  an  integer. 

By  the  way,  this  method  of  differentiating  a complicated  product  — leav- 
ing it  as  a product-is  usually  better  than  expressing  the  derivative  as  a sum. 
For  example  the  right  side  of 

£((x  + n)n...(x+1)') 

= (x  + n)n  . . . (x  + 1 )’  + + 

Vx  + n x + 1 ) 

would  be  a lot  messier  written  out  as  a sum. 

The  general  identities  in  Table  337  include  many  important  special  cases. 
For  example,  (7.43)  simplifies  to  the  generating  function  for  Hn  when  nr  = 0: 


n 


(7.57) 


This  equation  can  also  be  derived  in  other  ways;  for  example,  we  can  take  the 
power  series  for  In. (1  /(I  z))  and  divide  it  by  1 — z to  get  cumulative  sums. 

Identities  (7.51)  and  (7.52)  involve  the  respective  ratios  {mmn}/(mn1) 
and  [mmtJ  / which  have  the  undefined  form  O/O  when  n nr.  However, 

there  is  a way  to  give  them  a proper  meaning  using  the  Stirling  polynomials 
of  (6.45),  because  we  have 


m 

m — n 


/ 


m-  1 
n 


( — 1 )n+1n!  mcrn(n  — m) ; 


n!  mc7n(m) . 


(7-58) 

(7-59) 
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Thus,  for  example,  the  case  in  — 1 of  (7.51)  should  not  be  regarded  as  the 
power  series  2LWZ\^n!Kl  ’J/O'  but  rather  as 


ln( 1 + z) 


L' 

n'iO 


-z  "Gn  n- 


1 ) — 1 + \z  — YjZ1  + 


Identities  (7-53),  (7-55),  (7-54),  and  (7.56)  are  “double  generating  func- 
tions” or  “super  generating  functions”  because  they  have  the  form  G (w,  z)  = 
X.m  n 9m,nWmZn.  The  coefficient  of  wm  is  a generating  function  in  the  vari- 
able z;  the  coefficient  of  zn  is  a generating  function  in  the  variable  w. 


1 always  thought 
convolution  was 
what  happens  to 
my  brain  when  / 
try  to  do  a proof. 


7.5  CONVOLUTIONS 

The  convolution  of  two  given  sequences  (fo,  f;  , . . ) = (fn)  and 

(go^gi,-.  •>=  (gn)  is  the  sequence  (f  og0,  fogi+  ft  go, . . •)=  (Lkfk9nk) 

We  have  observed  in  Sections  5.4  and  7.2  that  convolution  of  sequences  cor- 
responds to  multiplication  of  their  generating  functions.  This  fact  makes  it 
easy  to  evaluate  many  sums  that  would  otherwise  be  difficult  to  handle. 

Example  1 : A Fibonacci  convolution. 

For  example,  let’s  try  to  evaluate  Y FkFn.._k  in  closed  form.  This  is 
the  convolution  of  ( F , ) with  itself,  so  the  sum  must  be  the  coefficient  of  zn 
in  F(z)2,  where  F(z)  is  the  generating  function  for  ( F , ) , All  we  have  to  do  is 
figure  out  the  value  of  this  coefficient. 

The  generating  function  F(z)  is  z/(  1 — z— z1),  a quotient  of  polynomials;  so 
the  general  expansion  theorem  for  rational  functions  tells  us  that  the  answer 
can  be  obtained  from  a partial  fraction  representation.  We  can  use  the  general 
expansion  theorem  (7.30)  and  grind  away;  or  we  can  use  the  fact  that 

FU|2  = 

1 / 1 2 1 \ 

" 5 \(1  - 4>z)2  (1 -4)zHl -$z)  + 

= l 2jn  + l)rzn  -\Y_  F-+’zn  + l L(n  + 1 '$nzn  ■ 

n^O  n^O  n^O 


Instead  of  expressing  the  answer  in  terms  of  <F>  and  if),  let’s  try  for  a closed 
form  in  terms  of  Fibonacci  numbers.  Recalling  that  tj)  + $ = ] we  have 


4>n  + $n  = [zn] 


1 1 

+ 


I — (|)Z  1 — $z 

2-  (4>  + $)z 
(l  - 4)z)(  1 $z) 


2 ■ : 


— z — z- 


= 2Fn+i  - Fn  . 
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Hence 

F(Z)2 


\ Y_  (n  + l)(2Fn+1-Fn)zn-^Fn+1zn, 
n.^0 


and  we  have  the  answer  we  seek: 


n 


n-k 


2nFu+ 1 --  (n  + 1 )Fn 

5 


(7.60) 


For  example,  when  n = 3 this  formula  gives  F0F3  + F1F2  + F2F1  + F3F0  = 
0+1  +1  +0  = 2 on  the  left  and  (6F4  — 4F3)/5  □ =(18  — 8) /5  = 2 on  the  right. 

Example  2:  Harmonic  convolutions. 

The  efficiency  of  a certain  computer  method  called  “samplesort”  depends 
on  the  value  of  the  sum 


integers  m,n  ^ 0. 


Exercise  5.58  obtains  the  value  of  this  sum  by  a somewhat  intricate  double 
induction,  using  summation  factors,  ft’s  much  easier  to  realize  that  Tm,n  is 
just  the  nth  term  in  the  convolution  of  ((^)  1 (m)>  (m)  > ■ ■ • ) with  (0,  .). 

Both  sequences  have  simple  generating  functions  in  Table  321: 

n^0  v v n>0 

Therefore,  by  (7.43), 


y- 


In 


(1  — z)m+1  1U  1 — z (1 


z)m+1 

n 

n — m 


In  fact,  there  are  many  more  sums  that  boil  down  to  this  same  sort  of 
convolution,  because  we  have 


1 


■ In 


1 


1 


In 


(1  -z)r+!  “ 1 -Z  (1-z)s+1  (1  -z)r+s+2  “1  -Z 

for  all  r and  s.  Equating  coefficients  of  zn  gives  the  general  identity 


L 


r + k\  /s  + n — k 


n — k 

n 


(Hr+k-Hr) 

(Flr+s+n+1  FTr4.s^i ) 


(7.61) 
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Bees  use  it's  so 
harmonic. 


This  seems  almost  too  good  to  be  true.  But  it  checks,  at  least  when  n = 2: 


r + 1\  /s  + 1\  1 /r  + 2\/s  + 0' 

r + 1 V 2 A 0 J\ r + 2 ' r+1 


1 


1 


1 1 

+ 


r + s + 3 
2 


1 


1 


r + s + 3 r+s+2 


Special  cases  like  s = 0 are  as  remarkable  as  the  general  case. 
And  there’s  more.  We  can  use  the  convolution  identity 


z 


r + k\  / s + n — k 
n-k 


n 


to  transpose  Hr  to  the  other  side,  since  Hr  is  independent  of  k: 


L 


r + k"\  /s  + n — k' 
n-k 


H 


r+k 


n 


(HT 


+ S + TI  + 1 


H 


r+s  + l 


+ Hr 


(7.62) 


There’s  still  more:  If  r and  s are  nonnegative  integers  l and  m,  we  can  replace 
(r+k)  by  (l*k)  and  (s^n^k)  by  (m+^~k);  then  we  can  change  k to  k-  1 and 
n to  n — m — 1,  getting 


L 

k=0 


n-k 


m 


Hv  = 


n + 1 
l + m+  1 


(Hn+i  — Hl+m+i  + Hi) , 

integers  l,  m,  n ^ 0.  (7-63) 


Even  the  special  case  l = m = 0 of  this  identity  was  difficult  for  us  to  handle 
in  Chapter  2!  (See  (2.36).)  We’ve  come  a long  way. 

Example  3:  Convolutions  of  convolutions. 

If  we  form  the  convolution  of  (fn)  and  (gn),  then  convolve  this  with  a 
third  sequence  (h,),  we  get  a sequence  whose  nth  term  is 


2_  fi  9k  hi  • 
j+k+l=n 

The  generating  function  of  this  three-fold  convolution  is,  of  course,  the  three- 
fold product  F(z)  G(z)  H(z).  In  a similar  way,  the  rn-fold  convolution  of  a 
sequence  ( gn)  with  itself  has  nth  term  equal  to 

) 9k]  9k2  1 t t 9km 

ki  +k2  + ---+km=n 

and  its  generating  function  is  G(z)m. 
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We  can  apply  these  observations  to  the  spans-of-fans  problem  considered 
earlier  (Example  6 in  Section  7.3).  It  turns  out  that  there’s  another  way  to 
compute  fn,  the  number  of  spanning  trees  of  an  n-fan,  based  on  the  config- 
urations of  tree  edges  between  the  vertices  {1,2,.  . . , n } : The  edge  between 
vertex  k and  vertex  k + 1 may  or  may  not  be  selected  for  the  subtree;  and 
each  of  the  ways  to  select  these  edges  connects  up  certain  blocks  of  adjacent  Concrete  blocks, 
vertices.  For  example,  when  n = 1 0 we  might  connect  vertices  (1 , 2},  {3}, 

{4, 5, 6, 7},  and  {8,9,10}: 

T1  o 

"9 
• 8 
»7 
"6 
"5 
• 4 

•3 


How  many  spanning  trees  can  we  make,  by  adding  additional  edges  to  ver- 
tex O?  We  need  to  connect  0 to  each  of  the  four  blocks;  and  there  are  two 
ways  to  join  0 with  {1 , 2},  one  way  to  join  it  with  {3},  four  ways  with  {4, 5, 6, 7}, 
and  three  ways  with  {S,  9,  10},  or  2 >1  -4-3  = 24  ways  altogether.  Summing 
over  all  possible  ways  to  make  blocks  gives  us  the  following  expression  for  the 
total  number  of  spanning  trees: 


fn  = YL  Y kik2...km.  (7.64) 

tn>0  k|  + k2  + ---+km  = n 
k|  ,k2 k*>0 

Forexample,  f4  = 4 + 3-1 +2-2+ 1 -3 +2-1-1  + 1 -2-1  + 1 -1 -2  + M -M  =21. 

This  is  the  sum  of  m-fold  convolutions  of  the  sequence  (0,  1,2, 3,.  . . ),  for 

m = 1,2,  3, . . . . hence  the  generating  function  for  {fn}  is 

F(z)  = G(z)+  G(z)2  + G(z)3  +•••  = 1 G(^  . 

1 - G(z) 

where  G(z)  is  the  generating  function  for  (0,  1,2,3,.  . .),  namely  z/(1  — z)2. 
Consequently  we  have 


(1  — z)2  — z 1 — 3z  + z2 


as  before.  This  approach  to  (fn)  is  more  symmetrical  and  appealing  than  the 
complicated  recurrence  we  had  earlier. 
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Example  4:  A convoluted  recurrence. 

Our  next  example  is  especially  important;  in  fact,  it’s  the  “classic  exam- 
ple” of  why  generating  functions  are  useful  in  the  solution  of  recurrences. 

Suppose  we  have  n + 1 variables  x<j,  X\ , . . . , Xn  whose  product  is  to  be 
computed  by  doing  n multiplications.  How  many  ways  Cn  are  there  to  insert 
parentheses  into  the  product  xo  -Xi  . . . ,-xn  so  that  the  order  of  multiplication  is 
completely  specified?  For  example,  when  n = 2 there  are  two  ways,  xo-  (x;  -x2  ) 
and  (xo-xi ) . X2.  And  when  n = 3 there  are  five  ways. 


x0-(xi - (x2 -X3)) , x0  - ((xi -x2) -X3)  , (Xo-Xl)-(x2-X3)  , 

(x0-(Xl-X2))-X3,  ( (Xo  • X] ) • X2 ) • X3 

Thus  c2  = 2,  C3  = 5;  we  also  have  C]  = 1 and  Co  = 1. 

Let’s  use  the  four-step  procedure  of  Section  7.3.  What  is  a recurrence 
for  the  C’s?  The  key  observation  is  that  there’s  exactly  one  1 . 1 operation 
outside  all  of  the  parentheses,  when  n > 0;  this  is  the  final  multiplication 
that  ties  everything  together.  If  this  1 . 1 occurs  between  xk  and  xk+i , there 
are  Ck  ways  to  fully  parenthesize  xo  • . . . . xk,  and  there  are  Cn  k ways  to 
fully  parenthesize  xk+i . . . . xn;  hence 

Cn  = CoCn-i  + C]  Cn-2  + • • ■ + Cn_i  Co , if  n > 0. 

By  now  we  recognize  this  expression  as  a convolution,  and  we  know  how  to 
patch  the  formula  so  that  it  holds  for  all  integers  n: 

Cn  = ^CkCn_!_k  + [n  = 0].  (7.65) 

k 

Step  1 is  now  complete.  Step  2 tells  us  to  multiply  by  zn  and  sum: 

c(z)  = Z Cnzn 

n 

= ^ ckcn_i  -\zn’  + y~  zn 

k.n  n=0 

= Z Ckzk^Cn..}_kZn-k  + 1 

k n 

= C(z)  • zC(z)  + 1. 

Lo  and  behold,  the  convolution  has  become  a product,  in  the  generating- 
The  authors  jest.  function  world.  Life  is  full  of  surprises. 
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Step  3 is  also  easy.  We  solve  for  C(z)  by  the  quadratic  formula: 


C(z) 


1 ± y/1  -4z 

22 


But  should  we  choose  the  + sign  or  the  sign?  Both  choices  yield  a function 
that  satisfies  C(z)  = zC(z)2  + 1 , but  only  one  of  the  choices  is  suitable  for  our 
problem.  We  might  choose  the  + sign  on  the  grounds  that  positive  thinking 
is  best;  but  we  soon  discover  that  this  choice  gives  C(0)  = oo,  contrary  to 
the  facts.  (The  correct  function  C(z)  is  supposed  to  have  C(0)  = Co  = 1 ■ ) 
Therefore  we  conclude  that 


C(z) 


1 ■ J I ■ 4 2 

~ 2z 


Finally,  Step  4.  What  is  [zn]  C(z)?  The  binomial  theorem  tells  us  that 


Vi  — 4z  = Y_  (~4z)k 
k^o  ' ' 

hence,  using  (5-37). 

]-y/T=4i  _ ^ 1 (-]/! 

Y>A 

= L 


'*5*( 


l (-Ml 
k-  1 


(— 4z) 


k , 


2z 


— 4z) 


k— 1 


n5  0 


\k-l 

—l/2\  (— 4z) 


n 


u + 1 


nSO 


2n 


n / n + 1 


The  number  of  ways  to  parenthesize,  Cn,  is  (2rJl)^y- 

We  anticipated  this  result  in  Chapter  5,  when  we  introduced  the  sequence 
of  Catalannumbers  (1,1,2,5,14,..  . ) = (C,).  This  sequence  arises  in  dozens 
of  problems  that  seem  at  first  to  be  unrelated  to  each  other  [41],  because 
many  situations  have  a recursive  structure  that  corresponds  to  the  convolution 
recurrence  (7.65). 

For  example,  let’s  consider  the  following  problem:  How  many  sequences 
(ai , Q2  . . . , a2n)  of  +1  ’s  and  ■ T s have  the  property  that 


CM  + a2  + • . . + a2n  =() 
and  have  all  their  partial  sums 


ai,  Q1  + q2,  0-1  + 0-2  + ’ • ’ + Ct2rv 

nonnegative?  There  must  be  n occurrences  of  +1  and  n occurrences  of  ■ 1 . 
We  can  represent  this  problem  graphically  by  plotting  the  sequence  of  partial 


So  the  convo- 
luted recurrence 
has  led  us  to  an 
oft-recurring  con- 
volution. 
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sums  sn  = ^k=i  a.),;  as  a function  of  n:  The  five  solutions  for  n = 3 are 


Aa.  /v\^ 

These  are  “mountain  ranges”  of  width  2n  that  can  be  drawn  with  line  seg- 
ments of  the  forms  /and  It  turns  out  that  there  are  exactly  Cn  ways  to 
do  this,  and  the  sequences  can  be  related  to  the  parenthesis  problem  in  the 
following  way:  Put  an  extra  pair  of  parentheses  around  the  entire  formula,  so 
that  there  are  n pairs  of  parentheses  corresponding  to  the  n multiplications. 
Now  replace  each  1 . 'by  +1  and  each  ')  ' by  -1  and  erase  everything  else. 
For  example,  the  formula  Xq  ■ ((xi  -X2)-  (X3  • X4 ) ) corresponds  to  the  sequence 
(+1 , +1 , — 1 , +1 , +1 , — 1 , — 1 , — 1 ) by  this  rule.  The  five  ways  to  parenthesize 
X0  -X1-X2-  X3  correspond  to  the  five  mountain  ranges  for  n = 3 shown  above. 

Moreover,  a slight  reformulation  of  our  sequence-counting  problem  leads 
to  a surprisingly  simple  combinatorial  solution  that  avoids  the  use  of  gener- 
ating functions:  How  many  sequences  (do,  Qi , Q2, . . . , ci2n)  of  +1  's  and  - l1  s 
have  the  property  that 

aO  + Qi  + a2  + . . . + Q2n  = 1 , 
when  all  the  partial  sums 

a0>  aQ  + 0-1  1 do  + al  + a2i  aO  + Ql  + . . + 0-2 n 

are  required  to  be  positive?  Clearly  these  are  just  the  sequences  of  the  pre- 
vious problem,  with  the  additional  element  ao  = +1  placed  in  front.  But 
the  sequences  in  the  new  problem  can  be  enumerated  by  a simple  counting 
argument,  using  a remarkable  fact  discovered  by  George  Raney  [243]  in  1959: 
If  (xi , X2, . . . , x,)  is  any  sequence  of  integers  whose  sum  is  +1 , exactly  one  of 
the  cyclic  shifts 

(xi , X2,  . . • , Xm),  (X2,  • ■ • , Xm,  X] ) , ■ (xm,  Xi  , . . . , Xm-1 ) 

has  all  of  its  partial  sums  positive.  For  example,  consider  the  sequence 
(3,  — 5, 2,  — 2, 3, 0).  Its  cyclic  shifts  are 

(3,  -5, 2,  -2, 3,0)  (-2, 3, 0,3, -5, 2) 

(—5,2,  —2, 3, 0,3)  (3, 0, 3,  —5, 2,  -2)  %/ 

(2,  —2, 3, 0,3,  -5)  (0,3, -5, 2, -2, 3) 

and  only  the  one  that’s  checked  has  entirely  positive  partial  sums. 
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Raney’s  lemma  can  be  proved  by  a simple  geometric  argument.  Let’s 
extend  the  sequence  periodically  to  get  an  infinite  sequence 

<Xl  > *2  > ■ • • >Xrn,Xi,X2,  . . . ,XTn,Xl,X2).  . . } J 

thus  we  let  Xm+|<  = X]<  for  a.  11  k 0.  If  we  now  plot  the  partial  sums  sn  = 
X]  + ■ • ■ + xn  as  a function  of  n,  the  graph  of  Sn  has  an  “average  slope”  of 
1/m,  because  sm+n  = sn  +1,  For  example,  the  graph  corresponding  to  our 
example  sequence  (3,  — 5, 2,  — 2, 3, 0, 3,  “5,2,. . . ) begins  as  follows: 


The  entire  graph  can  be  contained  between  two  lines  of  slope  1 /m,  as  shown; 
we  have  m = 6 in  the  illustration.  In  general  these  bounding  lines  touch  the 
graph  just  once  in  each  cycle  of  m points,  since  lines  of  slope  1/m  hit  points 
with  integer  coordinates  only  once  per  m units.  The  unique  lower  point  of 
intersection  is  the  only  place  in  the  cycle  from  which  all  partial  sums  will 
be  positive,  because  every  other  point  on  the  curve  has  an  intersection  point 
within  m units  to  its  right. 

With  Raney’s  lemma  we  can  easily  enumerate  the  sequences  (qq,  ....  ci2n) 
of  +1’s  and  -l’s  whose  partial  sums  are  entirely  positive  and  whose  total 
sum  is  -fl  There  are  (2nnhl)  sequences  with  n occurrences  of  -1  and  n + 1 
occurrences  of  +1,  and  Raney’s  lemma  tells  us  that  exactly  1/(2n+  1)  of 
these  sequences  have  all  partial  sums  positive.  (List  all  N = (2nnhl)  of  these 
sequences  and  all  2n  + 1 of  their  cyclic  shifts,  in  an  N x (2n  + 1)  array.  Each 
row  contains  exactly  one  solution.  Each  solution  appears  exactly  once  in  each 
column.  So  there  are  N/(2n+1)  distinct  solutions  in  the  array,  each  appearing 
(2n  + 1)  times.)  The  total  number  of  sequences  with  positive  partial  sums  is 


/2n+1\  1 

V n ) 2n  + 1 


1 

n + 1 


= C 


n 


Example  5:  A recurrence  with  m-fold  convolution. 

We  can  generalize  the  problem  just  considered  by  looking  at  sequences 
(do,...  , amn)  of  +l’s  and  (1  m)’s  whose  partial  sums  are  all  positive  and 


Ah,  if  stock  prices 
would  only  continue 
to  rise  like  this. 


(Attention,  com- 
puter scientists: 

The  partial  sums 
in  this  problem 
represent  the  stack 
size  as  a function  of 
time,  when  a prod- 
uct of  n + 1 factors 
is  evaluated,  be 
cause  each  "push” 
operation  changes 
the  size  by  +1  and 
each  multiplication 
changes  it  by  -1  .) 
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whose  total  sum  is  +1  . Such  sequences  can  be  called  m-Raney  sequences.  If 
there  are  k occurrences  of  (1  m)  and  mn  +1  k occurrences  of  +1  , we  have 


k ( 1 - m ) + ( m n + 1 - k ) = 1 , 


(Attention,  com- 
puter scientists: 

The  stack  interpre- 
tation now  applies 
with  respect  to  an 
m-ary  operation, 
instead  of  the  bi- 
nary multiplication 
considered  earlier.) 


hence  k = n.  There  are  (m™+1)  sequences  with  n occurrences  of  (1  — m)  and 
mn  + 1 n occurrences  of  +1,  and  Raney’s  lemma  tells  us  that  the  number 
of  such  sequences  with  all  partial  sums  positive  is  exactly 

m n + 1 \ I _ / mn_\  I 

n J m n + 1 \n  y (m  - 1 )n  + 1 

So  this  is  the  number  of  m-Raney  sequences.  Let’s  call  this  a Fuss-Catalan 
number  Cnm),  because  the  sequence  {Cn"1')  was  first  investigated  by  N.I. 
Puss  [109]  in  1791  (many  years  before  Catalan  himself  got  into  the  act).  The 
ordinary  Catalan  numbers  are  Cu  = Cj/'. 

Now  that  we  know  the  answer,  (7.66),  let’s  play  “Jeopardy”  and  figure 
out  a question  that  leads  to  it.  In  the  case  m = 2 the  question  was:  “What 
numbers  Cn  satisfy  the  recurrence  Cn  = Lk  CkCn_,_k  + [n  = O]?”  We  will 
try  to  find  a similar  question  (a  similar  recurrence)  in  the  general  case. 

The  trivial  sequence  (+1)  of  length  1 is  clearly  an  m-Raney  sequence.  If 
we  put  the  number  (1  -m)  at  the  right  of  any  m sequences  that  are  m-Raney, 
we  get  an  m-Raney  sequence;  the  partial  sums  stay  positive  as  they  increase 
to  +2,  then  +3, . . . , +m,  and  +1  . Conversely,  we  can  show  that  all  m-Raney 
sequences  {cto,-  • • ,Qmn)  arise  in  this  way,  if  n > 0:  The  last  term  a,„„  must 
be  (1  — m).  The  partial  sums  Sj  = Qq  + • • + ctj  _ 1 are  positive  for  1 <;  mn, 

and  Smn  = m because  Smn  + a„,„  = 1.  Let  k;  be  the  largest  index  ^ mn  such 
that  sk|  = 1;  let  k2  be  largest  such  that  sk2  = 2;  and  so  on.  Thus  skj  = j 
and  sk  > j,  for  kj  < k <C  mn  and  1 <C  j <(  m . 1 1 follows  that  km  = mn,  and 
we  can  verify  without  difficulty  that  each  of  the  subsequences  (qo,  . . . , ok,  _i), 
(ok„  ....  ak2_i),.  . . , (akln , . . . , akm_-|)is  an  m-Raney  sequence.  We  must 
have  k]  = mni  + 1,  k2  - k]  = mrt.2  + 1,  . . . , km  - km  1 = mn,  + 1,  for 
some  nonnegative  integers  m,  ri2,  • • • , n,. 

Therefore  (m,Jl+1)  mr]+1  is  the  answer  to  the  following  two  interesting  ques- 
tions: “What  are  the  numbers  defined  by  the  recurrence 


C(m,  = 


( L cl;icw...cw  + [n =0]  (7.67) 

\ni+n2-l hnm=n-l  / 


for  all  integers  n?”  “If  G(z)  is  a power  series  that  satisfies 

G(z)  = zG(z)m  + 1 , (7.68) 


what  is  [zn]  G(z)?” 
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Notice  that  these  are  not  easy  questions.  In  the  ordinary  Catalan  case 
(m  = 2),  we  solved  (7.68)  for  G(z)  and  its  coefficients  by  using  the  quadratic 
formula  and  the  binomial  theorem;  but  when  m = 3,  none  of  the  standard 
techniques  gives  any  clue  about  how  to  solve  the  cubic  equation  G = zG^  -f  1 . 

So  it  has  turned  out  to  be  easier  to  answer  this  question  before  asking  it. 

Now,  however,  we  know  enough  to  ask  even  harder  questions  and  deduce 
their  answers.  How  about  this  one:  “What  is  [zn]  G(z)1,  if  i is  a positive 
integer  and  if  G(z)  is  the  power  series  defined  by  (7.68)?"  The  argument  we 
just  gave  can  be  used  to  show  that  [zn]  G(z)’  is  the  number  of  sequences  of 
length  mn  + j,  with  the  following  three  properties: 

Each  element  is  either  +1  or  ( 1 m). 

The  partial  sums  are  all  positive. 

The  total  sum  is  l. 

For  we  get  all  such  sequences  in  a unique  way  by  putting  together  l sequences 
that  have  the  m-Raney  property.  The  number  of  ways  to  do  this  is 

£ C^’C<]4.G  ,C‘m)  = [z*]  G(z)1 . 

n 1 +n-2 1 lni=n 


Raney  proved  a generalization  of  his  lemma  that  tells  us  how  to  count 
such  sequences:  If  (x  1 , X2, . • • > xm)  is  any  sequence  of  integers  with  x<  <1  1 for 
all  j,  and  with  X]  + X2  + . . . + Xm  = l > 0,  the  n exactly  l of  the  cyclic  shifts 


^Xi  , X? , . . • » Xm) , (X2 , . . . , Xm,  X] ) , 


(Xm. , Xi , . . . , Xm  1 ) 


have  all  positive  partial  sums. 

For  example,  we  can  check  this  statement  on  the  sequence  (—2,1,  —1,0 


1 , 1 , — 1 , 1 , 1 , 1 ) . The  cyclic  shifts  are 

(-2, 1,-1, 0,1, 1,-1, 1,1,1} 
(1,-1, 0,1, 1,-1, 1,1, 1,-2) 
(-1,0, 1,1, -1,1, 1,1, -2,1) 
(0,1, 1,-1, 1,1, 1,-2, 1,-1) 

(1, 1,-1, 1,1, 1,-2, 1,  1,0)  V 


(1,-1, 1,1, 1,-2, 1,-1, 0,1) 
(-1,1, 1,1, -2, 1,-1, 0,1,1) 
(1,1, 1,-2, 1,-1, 0,1, 1,-1)  V 
(1,1, -2, 1,-1, 0,1,1, -1,1) 

(1,  -2,1, -1,0, 1,1, -1,1,1) 


and  only  the  two  examples  marked  V’  have  all  partial  sums  positive.  This 
generalized  lemma  is  proved  in  exercise  13. 

A sequence  of  +1’s  and  (1  m)’s  that  has  length  mn+  1 and  total  sum  l 
must  have  exactly  n occurrences  of  ( 1 — m).  The  generalized  lemma  tells 

us  that  l/(mn+  l)  of  these  sequences  have  all  partial  sums  positive; 
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hence  our  tough  question  has  a surprisingly  simple  answer: 


[zn]  G(z)1 


/ mn  + l\  l 

V n / ran  + l ’ 


(7-69) 


for  all  integers  l > 0. 

Readers  who  haven’t  forgotten  Chapter  5 might  well  be  experiencing  deja 
vu:  “That  formula  looks  familiar;  haven’t  we  seen  it  before?”  Yes,  indeed; 

equation  (5.60)  says  that 


[zn]'£t(z)r 


/tn  + r\  r 

\ n J tn  + r ' 


Therefore  the  generating  function  G(z)  in  (7.68)  must  actually  be  the  gener- 
alized binomial  series  23m(z).  Sure  enough,  equation  (5.59)  says 


% 


il  -m 


= Z, 


which  is  the  same  as 


= z£m(z)m 


Let’s  switch  to  the  notation  of  Chapter  5,  now  that  we  know  we’re  dealing 
with  generalized  binomials.  Chapter  5 stated  a bunch  of  identities  without 
proof.  We  have  now  closed  part  of  the  gap  by  proving  that  the  power  series 
23 1 (z)  defined  by 


£t(z) 


tn  + 1 


has  the  remarkable  property  that 


23t(z)r  = Y_ 


tn  + r\  r z 


n 


tn  + r 


whenever  t and  r are  positive  integers. 

Can  we  extend  these  results  to  arbitrary  values  oft  and  r?  Yes;  because 
the  coefficients  (tn^r)  7^7  are  polynomials  in  t and  r.  The  general  rth  power 
defined  by 


£t(z)r  = erln2Mz]  = L n 1 ^ 


n^O 


-L 

1 


(l-(Bt(z)) 
m 


n 


has  coefficients  that  are  polynomials  in  t and  r;  and  those  polynomials  are 
equal  to  (tnnhr)  77777  for  infinitely  many  values  oft  and  r.  So  the  two  sequences 
of  polynomials  must  be  identically  equal. 
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Chapter  5 also  mentions  the  generalized  exponential  series 

ri^O 


£t(z) 


rt! 


which  is  said  in  (5.60)  to  have  an  equally  remarkable  property: 
r(tn  + r)n  1 


[zn]  £t(z)T  = 


n! 


(7-7o) 


We  can  prove  this  as  a limiting  case  of  the  formulas  for  (Bt  (z),  because  it  is 
not  difficult  to  show  that 


£t(z)r  = lim  ‘Bxt(z/x)xr . 


7.6  EXPONE  N’I’I  AL  GF’S 

Sometimes  a sequence  (gn)  has  a generating  function  whose  proper- 
ties are  quite  complicated,  while  the  related  sequence  (gn/n!)  has  a generating 
function  that’s  quite  simple.  In  such  cases  we  naturally  prefer  to  work  with 
(fln/n!)  and  then  multiply  by  n!  at  the  end.  This  trick  works  sufficiently 
often  that  we  have  a special  name  for  it:  We  call  the  power  series 

G(z)  = ^9n^y  (7-7i) 

u^O 


the  exponential  generating  function  or  “egf”  of  the  sequence  (go,  gi , g2,  • • • )■ 
This  name  arises  because  the  exponential  function  ez  is  the  egf  of  (1  , 1 , 1,  . , . 

Many  of  the  generating  functions  in  Table  337  are  actually  egf’s.  For 
example,  equation  (7.50)  says  that  (in  m/m!  is  the  egf  for  the  sequence 
( [m]  ’ [rjj  > [m]  >•••)■  The  ordinary  generating  function  for  this  sequence  is 
much  more  complicated  (and  also  divergent). 

Exponential  generating  functions  have  their  own  basic  maneuvers,  analo- 
gous to  the  operations  we  learned  in  Section  7.2.  For  example,  if  we  multiply 
the  egf  of  (gn)  by  z,  we  get 


Zs 

n^O 


Zn+1 

TT 


= Y-  9n-l 

TL^I 


(n  — 1 )! 


n^O 


n-1 


n! 


this  is  the  egf  of  (0,  go, 2gi , . . . ) = (ngn_-|). 

Differentiating  the  egf  of  (go,  gi , g2,  • • • ) with  respect  to  z gives 


YnSn 

n^O 


z 


n— 1 


(n-1)! 


Y_  9n-l 

n^O 


rt! 


Are  we  having 
fun  yet? 


n! 


(7-72) 
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this  is  the  egf  of  ( g i , g 2 > • • • )■  Thus  differentiation  on  egf’s  corresponds  to  the 
left-shift  operation  (G(z)  g0)/z  on  ordinary  gf’s.  (We  used  this  left-shift 

property  of  egf’s  when  we  studied  hypergeometric  series,  (5.106).)  Integration 
of  an  egf  gives 


>Z 

Jo 


tn 

-7  dt 
z—  n! 


n^O 


z 


n + 1 


(n  + 1 )! 


Lfln-1 
u(>  1 


(7-73) 


this  is  a right  shift,  the  egf  of  (0,  g^ , g 1 , . . . ). 

The  most  interesting  operation  on  egf’s,  as  on  ordinary  gf’s,  is  multipli- 
cation. If  F(z)  and  G(z)  are  egf’s  for  (fn)  and  (gn),  then  F(z)G(z)  = H(z)  is 
the  egf  for  a sequence  (hn)  called  the  binomial  convolution  of  (f„)  and  (gn): 


h 


n 


(7-74) 


Binomial  coefficients  appear  here  because  (j!)  = rt!/k!  (n  k)!,  hence 


hn.  V-  fk  9n  k 

nT  “ 2—  k!  (n-k)!  ’ 

k— 0 ^ ’ 


in  other  words,  (hn/rt!)  is  the  ordinary  convolution  of  (fn/n!)  and  (gn/n!). 

Binomial  convolutions  occur  frequently  in  applications.  For  example,  we 
defined  the  Bernoulli  numbers  in  (6.79)  by  the  implicit  recurrence 


[m  = 0]  , 


for  all  m )>  0; 


this  can  be  rewritten  as  a binomial  convolution,  if  we  substitute  n for  m + 1 
and  add  the  term  Bn  to  both  sides: 

Y_  (n)Bk  = Bn  + [n  = l],  for  all  rt  7>  0.  (7.75) 

k k '/ 


We  can  now  relate  this  recurrence  to  power  series  (as  promised  in  Chapter  6) 
by  introducing  the  egf  for  Bernoulli  numbers,  B(z)  — X!.n>C  Btlzn/u!.  The 
left-hand  side  of  (7.75)  is  the  binomial  convolution  of  (B,,)  with  the  constant 
sequence  (1,1  , 1,  . );  hence  the  egf  of  the  left-hand  side  is  B(  z)ez.  The  egf 
of  the  right-hand  side  is  (B,  + [n  = 1 ])zn/n!  = B(z)  + z.  Therefore  we 

must  have  B(zi=  l/[cz  1);  we  have  proved  equation  (6.81),  which  appears 
also  in  Table  337  as  equation  (7.44). 
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Now  let’s  look  again  at  a sum  that  has  been  popping  up  frequently  in 
this  book, 

S,(n)  = 0m  + 1 m + 2”’  + ■ . . + (n  - 1)”  = ]T  km  . 

0$k<n 

This  time  we  will  try  to  analyze  the  problem  with  generating  functions,  in 
hopes  that  it  will  suddenly  become  simpler.  We  will  consider  n to  be  fixed 
and  m variable;  thus  our  goal  is  to  understand  the  coefficients  of  the  power 
series 

S(z)  = S0(n)  + S,  (n)  z + S2(n)  z2  H = J_Sm(n)zm. 

We  know  that  the  generating  function  for  (1,  k,  k2,  . . . ) is 


1 -kz 
hence 


^kmzm, 

m^O 


S ( z ) 


= y_  Y, kmzTn 

m^O  0$k<n 


= L 

0$k<n 


1 

1 - kz 


by  interchanging  the  order  of  summation.  We  can  put  this  sum  in  closed 
form, 


S(z) 


1 / 1 1 1 

z \z_1  — 0 z_1  — 1 — n + 1 

~(HZ-,  — Hs-i_n) ; 


(7-76) 


but  we  know  nothing  about  expanding  such  a closed  form  in  powers  of  z. 

Exponential  generating  functions  come  to  the  rescue.  The  egf  of  our 
sequence  (So(n),  Si  (n),  S2(n), . . . ) is 


S(z,u)  = So(n)  + Si  (u)  ^ + S2(n)  ^ + ■ 


m^O 


zm 

- y sm(u)  — 

*—  m! 


To  get  these  coefficients  Sm(n)  we  can  use  the  egf  for  (1,  k,  k2,.  ■ ■ ),  namely 

~m 

ekz  — Y km  =-  , 

z — m 


m^O 


and  we  have 


S(z,n)  = I [ r - = g 


„kz 


0^k<n 


0$k<n 
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And  the  latter  sUIUi  is  a geometric  progression,  so  there’s  a closed  form 


S(z,n) 


(7-77) 


All  we  need  to  do  is  figure  out  the  coefficients  of  this  relatively  simple  function, 
and  we’ll  know  Sm(n),  because  S,(n)  = m!  [zm]S(z,  n). 

Here’s  where  Bernoulli  numbers  come  into  the  picture.  We  observed  a 
moment  ago  that  the  egf  for  Bernoulli  numbers  is 


k^O 


hence  we  can  write 


S(z)  = B(z) 


= (b4+4+b  4+...)(4+4+4+...) 

The  sum  S,(n)  is  rn!  times  the  coefficient  of  zm  in  this  product.  For  example. 


So(n) 

= 0! 

(E"Trk) 

n ; 

Si(n) 

= 1! 

(B»4r  + B' 

—) 
1!  1!/ 

= K-in; 

S2(n) 

= 2! 

2TTT  + 82  Till) 

1 = 5n3  “ 2n2  + ln 

We  have  therefore  derived  the  formula  □n  = S2(n)=  |n(n  — — ^ f°r 

the  umpteenth  time,  and  this  was  the  simplest  derivation  of  all:  In  a few  lines 
we  have  found  the  general  behavior  of  S,(n)  for  all  m. 

The  general  formula  can  be  written 

Sm_,(n)=l( Bm(n)  - Bm(0))  , (7-78) 

where  B,(x)  is  the  Bernoulli  polynomial  defined  by 


B,(x) 


(779) 


Here’s  why:  The  Bernoulli  polynomial  is  the  binomial  convolution  of  the 
sequence  (Bo,Bi,B;>,.  . . ) with  (1,  x,X2,.-  • );  hence  the  exponential  generating 
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function  for  (Bo(x),  B;  (x),  Bj  (x),  . . . ) is  the  product  of  their  egf’s, 


B(Z,X)  = Bm(x) 
m^>0 


m! 


I 

m^O 


m! 


zexz 

ez-  i ’ 


(7.80) 


Equation  (7.78)  follows  because  the  egf  for  (0,  So(n),  2Si  (n),  . . . ) is,  by  (7.77), 
e nz  I 

z - = B(z,n)  B(z,  0) 

C — i 

Let’s  turn  now  to  another  problem  for  which  egf’s  are  just  the  thing: 
How  many  spanning  trees  are  possible  in  the  complete  graph  on  n vertices 
(1,2,...  , n}?  Let’s  call  this  number  tn.  The  complete  graph  has  -intn  — 1) 
edges,  one  edge  joining  each  pair  of  distinct  vertices;  so  we’re  essentially 
looking  for  the  total  number  of  ways  to  connect  up  n given  things  by  drawing 
n — 1 lines  between  them. 

We  have  t;  = = 1-  Also  t3  = 3,  because  a complete  graph  on  three 

vertices  is  a fan  of  order  2;  we  know  that  f2  = 3.  And  there  are  sixteen 
spanning  trees  when  n.  = 4: 


kxs  nun  nmu 
mu  kx  uz  x (7.81) 


Hence  t4  = 16. 

Our  experience  with  the  analogous  problem  for  fans  suggests  that  the  best 
way  to  tackle  this  problem  is  to  single  out  one  vertex,  and  to  look  at  the  blocks 
or  components  that  the  spanning  tree  joins  together  when  we  ignore  all  edges 
that  touch  the  special  vertex.  If  the  non-special  vertices  form  m components 
of  sizes  k; , k2,  , • • , k^,  then  we  can  connect  them  to  the  special  vertex  in 
k]  k2  . . • km  ways.  For  example,  in  the  case  n = 4,  we  can  consider  the  lower 
left  vertex  to  be  special.  The  top  row  of  (7.81)  shows  3t3  cases  where  the  other 
three  vertices  are  joined  among  themselves  in  t3  ways  and  then  connected  to 
the  lower  left  in  3 ways.  The  bottom  row  shows  2.1  x t^tix  (2)  solutions  where 
the  other  three  vertices  are  divided  into  components  of  sizes  2 and  1 in  (,) 
ways;  there’s  also  the  case  where  the  other  three  vertices  are  completely 
unconnected  among  themselves. 

This  line  of  reasoning  leads  to  the  recurrence 


m>0 


m! 


L 


ki  -j-k-2  H hkm-n-1 


n - 1 
k-1  , k-2 » . . . , ICn 


k;  k2  . . . km  tic.  tic,  ■ • • tk 
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for  all  n > 1.  Here’s  why:  There  are  (k  k™  1 k ) ways  to  assign  n-  1 elements 
to  a sequence  of  rri  components  of  respective  sizes  ki , k2,  . . . , k,;  there  are 
tic,  tk2 . . . tk  ways  to  connect  up  those  individual  components  with  spanning 
trees;  there  are  k;  k2  • • • km  ways  to  connect  vertex  n to  those  components;  and 
we  divide  by  m!  because  we  want  to  disregard  the  order  of  the  components. 
For  example,  when  n = 4 the  recurrence  says  that 

t4  = 3t3  + 2 ( (i t2  + G.i)^2*1)  + H(i,i  i)G)  = 3t3  + 6t2ti  + t,  . 

The  recurrence  for  tn  looks  formidable  at  first,  possibly  even  frightening; 
but  it  really  isn’t  bad,  only  convoluted.  We  can  define 


Un  = nt, 


and  then  everything  simplifies  considerably: 


U, 


■n  _ 


n! 


L 


m>0 


ki  +k2  4 hkm  — n— 1 


Uk,  Uk2  uk 
ki ! k2!  km! 


-if  n > 1 , (7.82) 


The  inner  sum  is  the  coefficient  of  zn~’  -in  the  egf  U (z)  , raised  to  the  mth 
power;  and  we  obtain  the  correct  formula  also  when  n = 1 if  we  add  in  the 
term  U(z)°  that  corresponds  to  the  case  m = 0.  So 


~ = [z^1] 

n! 


Ls a,z) 


m = [zn-']  eu,z)  = [zn]  zeuu) 


m^O 


(7-83) 


for  all  n > 0,  and  we  have  the  equation 

U(z)  = ze0(2). 

Progress!  Equation  (7.83)  is  almost  like 

£(z)  = ez£U), 

which  defines  the  generalized  exponential  series  £(z)  = £1  (z)  in  (5-59)  and 
(7.70);  indeed,  we  have 


U(z)  = z£(z) 


So  we  can  read  off  the  answer  to  our  problem: 

tn  = — = - tzn]  U(z)  = ( n - 1 ) ! [zn_1]  £(z)  = n^2  (7.84) 

n n 

The  complete  graph  on  {1  ,2,  . . . , n)  has  exactly  nn  ^ spanning  trees,  for  all 
n > 0. 
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7.7  DIRICHLET  GENERATING  FUNCTIONS 

There  are  many  other  possible  ways  to  generate  a sequence  from  a 
series;  any  system  of  “kernel”  functions  K,(z)  such  that 

K,(z)  = 0 =$  gn  = 0 for  all  n 
n 

can  be  used,  at  least  in  principle.  Ordinary  generating  functions  use  K,(z)  = 
zn,  and  exponential  generating  functions  use  Kn  [z)=  Zn/ri!;  we  could  also  try 
falling  factorial  powers  z— , Of  binomial  coefficients  z-/n!  = (*)  . 

The  most  important  alternative  to  gf’s  and  egf’s  uses  the  kernel  functions 
1 /nz\  it  is  intended  for  sequences  (gi , Q2, . ■ ■ ) that  begin  with  n = 1 instead 
of  n = 0: 

gw  = US-  (7-85) 

Tl^l 


This  is  called  a Dirichlet  generating  function  (dgf),  because  the  German 
mathematician  Gustav  Lejeune  Dirichlet  (1805-1859)  made  much  of  it. 

For  example,  the  dgf  of  the  constant  sequence  ( 1 , 1 , 1 , . . . ) is 


L 


1 

nz 


C(z) 


(T 


This  is  Riemann’s  zeta  function,  which  we  have  also  called  the  generalized 
harmonic  number  when  z > 1. 

The  product  of  Dirichlet  generating  functions  corresponds  to  a special 
kind  of  convolution: 


F(z)  G(z) 


L 


f l 9m 
lz  mz 


Y ft  gm  [l-m  = n] . 

nz  L — 

nf>  1 l,m^1 


Thus  F(z)  G(z)  = H(z)  is  the  dgf  of  the  sequence 

h-n  = ^ f d 9n/d  • (7-^7) 

d\n 

For  example,  we  know  from  (4.55)  that  X!d\n  M-(d)  = [n  = 1 ];  this  is 
the  Dirichlet  convolution  of  the  Mobius  sequence  (p.(  1) , |x(  2),  |l(  3), . . . ) with 
(1,1,1,...),  hence 

M(z)C(z)  = V = 1 . 

n 

In  other  words,  the  dgf  of  (p.(l ),  (_l ( 2 ) , |i(3),  . . . ) is  C(z)~' 


(7.88) 
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Dirichlet  generating  functions  are  particularly  valuable  when  the  se- 
quence is  a multiplicative  function,  namely  when 

9mn  — 9m  9n  for  m _L  n. 

In  such  cases  the  values  of  gn  for  all  n are  determined  by  the  values  of  gn  when 
n is  a power  of  a prime,  and  we  can  factor  the  dgf  into  a product  over  primes: 

= II  ('  + £ + £ + £ + -)  (7.89) 

p prime  ' " ' ^ ' 

If,  for  instance,  we  set  gn  = 1 for  all  n,  we  obtain  a product  representation 
of  Riemann’s  zeta  function: 

^ = n ( (7-90) 

p prime  ' 

The  Mobius  function  has  p.(p)  = -1  and  p.(pk)  = 0 for  k > 1,  hence  its  dgf  is 

M(z)  = Yl  ( 1 -p“z);  (7.91) 

p prime 

this  agrees,  of  course,  with  (7.88)  and  (7.90).  Euler’s  cp  function  has  (p(pk)  = 
pk_pk-i  hence  its  dgf  has  the  factored  form 

i(z>  = n (,+^)  = n (7.9=) 

p prime  v p prime 

We  conclude  that  ®(z)  = C(z  — 1)/£(z). 


Exercises 

Warmups 

1 An  eccentric  collector  of  2 x n domino  tilings  pays  $4  for  each  vertical 
domino  and  $1  for  each  horizontal  domino.  How  many  tilings  are  worth 
exactly  $m  by  this  criterion?  For  example,  when  m = 6 there  are  three 
solutions:  IB,  El,  and  I I I I 

2 Give  the  generating  function  and  the  exponential  generating  function  for 
the  sequence  (2,5, 13,35, . . . ) = (2”  + 3n)  in  closed  form. 

3 What  is  Hn/10n? 

4 The  general  expansion  theorem  for  rational  functions  P(z)/Q(z)  is  not 
completely  general,  because  it  restricts  the  degree  of  P to  be  less  than 
the  degree  of  Q.  What  happens  if  P has  a larger  degree  than  this? 
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5  Find  a generating  function  S(z)  such  that 


Basics 

6 Show  that  the  recurrence  (7.32)  can  be  solved  by  the  repertoire  method, 
without  using  generating  functions. 

7 Solve  the  recurrence 


9o  = 1 ; 

9n  = 9n  i + 2gu_2  + • • • + Tigo  , for  n > 0. 


8 What  is  [z11]  (ln(1  - z))2/(1  - z)m+1? 

9 Use  the  result  of  the  previous  exercise  to  evaluate  LLo  HkHn-k. 

10  Set  r = s — — 1 /2  in  identity  (7.61)  and  then  remove  all  occurrences  of 
1 /2  by  using  tricks  like  (5.36).  What  amazing  identity  do  you  deduce? 

11  This  problem,  whose  three  parts  are  independent,  gives  practice  in  the 
manipulation  of  generating  functions.  We  assume  that  A(z)  = ^ anzn, 
B ( z ) = bnZn,C(z)  = X!n  cnZn,  and  that  the  coefficients  are  zero  for 
negative  n. 

a If  Cu  = £lj+2k<n  Qibk>  express  C in  terms  of  A and  B. 

b If  nb,  — 2kaic/(n  — k)!,  express  A in  terms  of  B. 

c If  T is  a real  number  and  if  a,  = 0kk)bn-k’  express  A in 

terms  of  B;  then  use  your  formula  to  find  coefficients  fk(r)  such  that 
bn  = Lk=0fk(T)  Qn-  k- 

12  How  many  ways  are  there  to  put  the  numbers  {1,2,...  ,2n}  into  a 2 x n 
array  so  that  rows  and  columns  are  in  increasing  order  from  left  to  right 
and  from  top  to  bottom?  For  example,  one  solution  when  n = 5 is 


I deduce  that  Clark 
Kent  is  really 
Superman. 


/ 1 2 4 5 8 \ 

\ 3 6 7 9 10  J * 


13  Prove  Raney’s  generalized  lemma,  which  is  stated  just  before  (7.69). 

14  Solve  the  recurrence 


9o  = 0 , gi  = l, 

9n  = — 2ngn„i  + f )gkgn_k,  for  n > 1 , 
k k ' 


by  using  an  exponential  generating  function. 
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15  The  Bell  number  bn  is  the  number  of  ways  to  partition  n things  into 
subsets.  For  example,  b3  = 5 because  we  can  partition  {1  ,2,3}  in  the 
following  ways: 

{1,2,3};  {1,2}  U {3} ; {1,3}U{2};  {1}U{2,3};  {1}U{2}U{3}. 

Prove  that  ba+i  = (£)bn_kl  and  use  this  recurrence  to  find  a closed 
form  for  the  exponential  generating  function  ^ bnzn/n!. 

16  Two  sequences  (a„)  and  (bn)  are  related  by  the  convolution  formula 


also  a0  = 0 a:nd  bo  = 1-  Prove  that  the  corresponding  generating  func- 
tions satisfy  lnB(z)  :=  A(z)  + jA(z2)  + jA(z3)  H . 

17  Show  that  the  exponential  generating  function  G(z)  of  a sequence  is  re- 
lated to  the  ordinary  generating  function  G (z)  by  the  formula 

•OO 

G(zt)e-tdt  = G(z), 

Jo 

if  the  integral  exists. 

1 8 Find  the  Dirichlet  generating  functions  for  the  sequences 

a 9n  = 

b gn  = Inn.; 

c gn  = [n  is  squarefree]. 

Express  your  answers  in  terms  of  the  zeta  function.  (Squarefreeness  is 
defined  in  exercise  4.13.) 

19  Every  power  series  F(z)  = fnZn  with  fo  = 1 defines  a sequence  of 

polynomials  f„(x)  by  the  rule 

F(z)x  = ^n(x)zn, 
n^O 

where  fn(  1 ) = fn  and  fn(0)  = [n  = 0],  In  general,  f , ( x ) has  degree  n. 
Show  that  such  polynomials  always  satisfy  the  convolution  formulas 

n 

fk(x)fn-k(y)  = Fn(x  +y)  J 

ic=o 

n 

(x  + y)£kfk(x)fn_k(y)  = xnfn(x  + y). 

k==0 

(The  identities  in  Tables  202  and  258  are  special  cases  of  this  trick.) 
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20  A power  series  G(z)  is  called  differentiably  finite  if  there  exist  finitely 

many  polynomials  P0  (z),  . . . , P,(z),  not  all  zero,  such  that 

Po(z)G(z)  + Pi  (z)G'(z)  3 f Pm(z)G(m)(z)  = 0. 

A sequence  cf  numbers  (go,  9i  ,92.-  • • ) is  called  polynomially  recursive 
if  there  exist  finitely  many  polynomials  po  (z),  . . , pm(z),  not  all  zero, 
such  that 

Po(n)gn  + p1(n)gn+1  + • • ■ + pTn(n)gn+Tn  = 0 

for  all  integers  n ^ 0.  Prove  that  a generating  function  is  differentiably 
finite  if  and  only  if  its  sequence  of  coefficients  is  polynomially  recursive. 

Homework  exercises 

21  A robber  holds  up  a bank  and  demands  $500  in  tens  and  twenties.  He 

also  demands  to  know  the  number  of  ways  in  which  the  cashier  can  give 
him  the  money.  Find  a generating  function  G(z)  for  which  this  number 
is  [z500]  G(z),  and  a more  compact  generating  function  G(z)  for  which 
this  number  is  [z50]  G (z).  Determine  the  required  number  of  ways  by 
(a)  using  partial  fractions;  (b)  using  a method  like  (7.39). 

22  Let  P be  the  sum  of  all  ways  to  “triangulate”  polygons: 

P = _ + A + 

(The  first  term  represents  a degenerate  polygon  with  only  two  vertices; 
every  other  term  shows  a polygon  that  has  been  divided  into  triangles. 
For  example,  a pentagon  can  be  triangulated  in  five  ways.)  Define  a 
“multiplication”  operation  AAB  on  triangulated  polygons  A and  B so 
that  the  equation 

P = _ + PAP 

is  valid.  Then  replace  each  triangle  by  ‘z’;  what  does  this  tell  you  about 
the  number  of  ways  to  decompose  an  n-gon  into  triangles? 

23  In  how  many  ways  can  a 2 x 2 x n pillar  be  built  out  of  2 x 1 x 1 bricks? 

24  How  many  spanning  trees  are  in  an  n-wheel  (a  graph  with  n “outer” 
vertices  in  a cycle,  each  connected  to  an  (n  + 1 )st  “hub”  vertex),  when 
n ^ 3? 


+ 


Will  he  settle  for 
2 x n domino 

tilings? 


At  union  rates,  as 
many  as  you  can 
afford,  plus  a tew. 


7 EXERCISES  361 


25  Let  m ^ 2 be  an  integer.  What  is  a closed  form  for  the  generating 

function  of  the  sequence  (n  mod  m),  as  a function  of  z and  m?  Use 
this  generating  function  to  express  ‘n  mod  m’  in  terms  of  the  complex 
number  lv  — zlnx/m.  (For  example,  when  m = 2 we  have  tu  = -1  and 
n mod  2 = \ - 1(— l)n.) 

26  The  second-order  Fibonacci  numbers  are  defined  by  the  recurrence 

do  - 0;  5i  = 1 ; 

3n  = 3n-1  +5n-2  + Fn  for  n > L 

Express  in  terms  of  the  usual  Fibonacci  numbers  Fn  and  Fn+1  . 

27  A 2 x n domino  tiling  can  also  be  regarded  as  a way  to  draw  n disjoint 
lines  in  a 2 x n array  of  points: 

I ~ ~ I I 1 

If  we  superimpose  two  such  patterns,  we  get  a set  of  cycles,  since  ev- 
ery point  is  touched  by  two  lines.  For  example,  if  the  lines  above  are 
combined  with  the  lines 


the  result  is 

0 rrm ::  n . 

The  same  set  of  cycles  is  also  obtained  by  combining 

1 I ~ ~ ~ I I with  . 

But  we  get  a unique  way  to  reconstruct  the  original  patterns  from  the 
superimposed  ones  if  we  assign  orientations  to  the  vertical  lines  by  using 
arrows  that  go  alternately  up/down/up/down/.  . . in  the  first  pattern  and 
alternately  down/up/down/up/.  . in  the  second.  For  example, 

i :: ::  j ~ h + h::::  = * o ° n 

The  number  of  such  oriented  cycle  patterns  must  therefore  be  T^  = F^+1  , 
and  we  should  be  able  to  prove  this  via  algebra.  Let  Qn  be  the  number 
of  oriented  2 x 11  cycle  patterns.  Find  a recurrence  for  QU)  solve  it  with 
generating  functions,  and  deduce  algebraically  that  Qn  = F^+1  . 

28  The  coefficients  of  A(z)  in  (7.39)  satisfy  Ar+Ar+io  + Ar+2o+Ar+3o  = 100 
for  0 ^ r < 10.  Find  a “simple”  explanation  for  this. 
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29  What  is  the  sum  of  Fibonacci  products 

Y_  Y_  FklFk2...Fkm? 

m>0  k|  +k2 H hkm=n 

ki  ,kj km>0 

30  If  the  generating  function  G(z)  = 1/(  1 — cxz)(1  — (Bz)  has  the  partial 

fraction  decomposition  a/(  1 — az)  +b/(  1 (Bz) , what  is  the  partial  fraction 
decomposition  of  G(z)n? 

31  What  function  g(n)  of  the  positive  integer  n satisfies  the  recurrence 

Y_  9(d]  <p(n/d)  = 1 , 

d\n 

where  cp  is  Euler’s  totient  function? 

32  An  arithmetic  progression  is  an  infinite  set  of  integers 

{an  + b}  = {b,  a + b,  2a  + b,  3a  + b > . , , } . 

A set  of  arithmetic  progressions  {qi  n -f  bi },  . . . , {amn  T b,}  is  called  an 
exact  cover  if  every  nonnegative  integer  occurs  in  one  and  only  one  of  the 
progressions.  For  example,  the  three  progressions  {2rt},  {4 n+  1},  {4n  + 3) 
constitute  an  exact  cover.  Show  that  if  {aj  n + b i },  . . , {amn  + b, } is  an 
exact  cover  such  that  2 qi  <t  • • . <C  a„„  then  am_i  = a,.  Hint:  Use 
generating  functions. 

Exam  problems 

33  What  is  [wmzT'']  (ln(  1 + z))/(l  — wz)? 

34  Find  a closed  form  for  the  generating  function  ^n>0  Gn(z)wn,  if 


(Here  m is  a fixed  positive  integer.) 

35  Evaluate  the  sum  2Io<k<n  1 /lc(n  — k)  in  two  ways: 
a Expand  the  summand  in  partial  fractions. 

b Treat  the  sum  as  a convolution  and  use  generating  functions. 

36  Let  A(z)  be  the  generating  function  for  (do,  di , 0-2,  CI3,  . . . ).  Express 

Y ^ a^n/-mjZn  in  terms  of  A,  z>  and  m. 
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37  Let  a,,  be  the  number  of  ways  to  write  the  positive  integer  n as  a sum  of 
powers  of  2,  disregarding  order.  For  example,  <14  = 4,  since  4 = 2 + 2 = 

2 + 1+1  = 1 + 1 + 1 + 1 . By  convention  we  let  a0  = 1 . Let  bn  = £Ik=0  ak 

be  the  cumulative  sum  of  the  first  a’s. 

a Make  a table  of  the  a’s  and  b’s  up  through  n = 10.  What  amazing 
relation  do  you  observe  in  your  table?  (Don’t  prove  it  yet.) 
b Express  the  generating  function  A(z)  as  an  infinite  product, 
c Use  the  expression  from  part  (b)  to  prove  the  result  of  part  (a). 

38  Find  a closed  form  for  the  double  generating  function 

M(w,z)  ~ ^ min(m,n)  wmzn 

m,n^0 

Generalize  your  answer  to  obtain,  for  fixed  in  ->  2,  a closed  form  for 

M(zr,(..,zm)  = Y_  min(ni , . . , n,)  z^11 . . . z^m  • 

Tl) nm^0 


39  Given  positive  integers  m and  n,  find  closed  forms  for 


L 


ki  k2 . . . kn 


and 


1 ^k|  <k.2<  -<km^n 


L 


k]  k2  . . . k„ 


l§ki  $k2  + - + km^n 


(For  example,  when  m = 2 and  n = 3 the  sums  are  1 • 2 + 1 ■ 3 + 2 • 3 and 
1 -1  +1  -2+1  ■3+2-2+2-3+3-3.)  Hint:  What  are  the  coefficients  of  zm  in  the 
generating  functions  ( 1 +aiz)..(l  + anz)  and  l/(  1 ~aiz) . . . (1  — anz)? 

40  Express  (kMLFk-i  — Fk)(n  — k)j  in  closed  form. 

41  An  up-down  permutation  of  order  n is  an  arrangement  ai  a2  . . . a„  of 
the  integers  {1,2,...,  Tl}  that  goes  alternately  up  and  down: 


Gi  < a2  > 03  < CI4  > • 1 • 

For  example,  35142  is  an  up-down  permutation  of  order  5.  If  A,  de- 
notes the  number  of  up-down  permutations  of  order  n,  show  that  the 
exponential  generating  function  of  (A„)  is  (1  + sin  z)/cos  z. 

42  A space  probe  has  discovered  that  organic  material  on  Mars  has  DNA 
composed  of  five  symbols,  denoted  by  (a,  b,  c,  d,  e),  instead  of  the  four 
components  in  earthling  DNA.  The  four  pairs  cd,  ce,  ed,  and  ee  never 
occur  consecutively  in  a string  of  Martian  DNA,  but  any  string  with- 
out forbidden  pairs  is  possible.  (Thus  bbcda  is  forbidden  but  bbdca  is 
OK.)  How  marry  Martian  DNA  strings  of  length  n are  possible?  (When 
n = 2 the  answer  is  21,  because  the  left  and  right  ends  of  a string  are 
distinguishable.) 
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43  The  Newtonian  generating  function  of  a sequence  (gn)  is  defined  to  be 


Find  a convolution  formula  that  defines  the  relation  between  sequences 
(fn),  (gn),  and  (h,)  whose  Newtonian  generating  functions  are  related 
by  the  equation  F(z)G  (z)  = H(z).  Try  to  make  your  formula  as  simple 
and  symmetric  as  possible. 

44  Let  qn  be  the  number  of  possible  outcomes  when  n numbers  {xi , . • ,Xn} 
are  compared  with  each  other.  For  example,  q3  = 13  because  the  possi- 
bilities are 


X,  < X2  < x3  ; 
*i=x2=x3; 

X2  < Xl  = x 3 ; 
X3  <Xi  <X2  ; 


Xl  <X2  =X3  ; Xl  <x3  <X2  ; 
Xi==X3<x2;  x2<xi<x3; 

x2  < x3  < xi ; x2  = x3  < xi ; 
*3  < Xi  = Xi ; x3  < x2  < xi . 


Xl  =x2  <x3 ; 


Find  a closed  form  for  the  egf  Q(z)  = ^ qnz n/tx!.  Also  find  sequences 
(Qn),  (bn),  (cn)  SUCh  that 


qn  = Y.  knQk  = Y.  (n}bk 

k^O  k ^ ' 

45  Evaluate  Hmn>0trrL  1 n]/m2n2. 

46  Evaluate 


for  all  n > 0. 


in  closed  form.  Hint:  z3  z2  + jj  = (z+  j)(z—  |)2. 

47  Show  that  the  numbers  Un  and  Vn  of  3 x n domino  tilings,  as  given  in 
(7.34),  are  closely  related  to  the  fractions  in  the  Stern-Brocot  tree  that 
converge  to  \f$. 

48  A certain  sequence  (gn)  satisfies  the  recurrence 


agn  + bgn+i  + cgn+2  + d = 0 , integer  n ^ 0, 


for  some  integers  (a,  b,  c,  d)  with  gcd(a,  b,  c,  d)  = 1.  It  also  has  the  closed 
form 

gn  = [a(  1 + \/2  ) aJ  , integer  n t>  0, 
for  some  real  number  a between  0 and  1.  Find  a,  b,  c,  d,  and  a. 
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Kissinger,  take  note, 


Is  this  a hint  or  a 

warning? 


49  This  is  a problem  about  powers  and  parity. 

a Consider  the  sequence  (qo,  ai , Q2, . . . ) = (2,2, 6, . . . ) defined  by  the 
formula 

a n - (1  + \/2)n  + ( i V2)n. 

Find  a simple  recurrence  relation  that  is  satisfied  by  this  sequence, 
b Prove  that  |"(1  + \/2)n]  = n (mod  2)  for  all  integers  n > 0. 
c Find  a number  a of  the  form  (p  -f-  ^/q  ) /2,  where  p and  q are  positive 

integers,  such  that  [ctnJ  = n (mod  2)  for  all  integers  n > 0. 

Bonus  problems 

50  Continuing  exercise  22,  consider  the  sum  of  all  ways  to  decompose  poly- 
gons into  polygons: 

q =_+ a+d+s+0 

+Q+£s/+,Q+0+<D+0+<Ql/+ 

Find  a symbolic  equation  for  Q and  use  it  to  find  a generating  function 
for  the  number  of  ways  to  draw  nonintersecting  diagonals  inside  a convex 
n-gon.  (Give  a closed  form  for  the  generating  function  as  a function  of  z\ 
you  need  not  find  a closed  form  for  the  coefficients.) 

51  Prove  that  the  product 


1 ^k^Tl 


is  the  generating  function  for  tilings  of  an  m x n rectangle  with  dominoes. 
(There  are  mn  factors,  which  we  can  imagine  are  written  in  the  mn  cells 
of  the  rectangle.  If  mn  is  odd,  the  middle  factor  is  zero.  The  coefficient 
of  O’  ak  is  the  number  of  ways  to  do  the  tiling  with  j vertical  and  k 
horizontal  dominoes.)  Hint:  This  is  a difficult  problem,  really  beyond 
the  scope  of  this  book.  You  may  wish  to  simply  verify  the  formula  in  the 
case  m = 3,  n □ = 4. 

52  Prove  that  the  polynomials  defined  by  the  recurrence 

f | n TV — 1 ( 1 ^ 

Pu(tj)  = (y  - 4)  ™ £_  (^J  (^J  Pk(y)  - integer  n ^ 0, 

have  the  form  pn(y)  = £lm=0  | ™ |yn,  where  |^|  is  a positive  integer  for 
l m ^ n.  H int:  This  exercise  is  very  instructive  but  not  very  easy. 
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53  The  sequence  of  pentagonal  numbers  (1,5,12,22,.  . . ) generalizes  the 
triangular  and  square  numbers  in  an  obvious  way: 


Let  the  nth  triangular  number  be  Tn  = n(n+1)/2;  let  the  nth  pentagonal 
number  be  Pn  = n(3n  — 1)/2;  and  let  Un  be  the  3 x n domino-tiling 
number  defined  in  (7.38).  Prove  that  the  triangular  number  T(u4n+2_i  yj 
is  also  a pentagonal  number.  Hint:  3U-2n  □ = (V2n_i  + V2n+i  )2  + 2. 

54  Consider  the  following  curious  construction: 


1 

2 

3 4 

5 6 

7 

8 

9 

10  11 

12 

13 

14 

15  16 

1 

2 

3 4 

6 

7 

8 

9 

11 

12 

13 

14 

16 

1 

3 

6 1 0 

16 

23 

31 

40 

51 

63 

76 

90 

106 

1 

3 

6 

16 

23 

31 

51 

63 

76 

106 

1 

4 

10 

26 

49 

80 

131 

194 

270 

376 

1 

4 

26 

49 

131 

194 

376 

1 

5 

31 

80 

211 

405 

781 

1 

31 

211 

781 

1 

32 

243 

1024 

(Start  with  a row  containing  all  the  positive  integers.  Then  delete  every 
mth  column;  here  m = 5.  Then  replace  the  remaining  entries  by  partial 
sums.  Then  delete  every  (m  — 1 )st  column.  Then  replace  with  partial 
sums  again,  and  so  on.)  Use  generating  functions  to  show  that  the  final 
result  is  the  sequence  of  mth  powers.  For  example,  when  m = 5 we  get 
( 1 5 , 25 , 35 , 45 , . . . ) as  shown. 

55  Prove  that  if  the  power  series  F(z)  and  G(z)  are  differentiably  finite  (as 

defined  in  exercise  20),  then  so  are  F(z)  -|-G(z)and  F(z)G(z). 

Research  problems 

56  Prove  that  there  is  no  “simple  closed  form”  for  the  coefficient  of  zn  in 

(1  + z+  z2)n,  as  a function  of  n,  in  some  large  class  of  “simple  closed 
forms!’ 

5’7  Prove  or  disprove:  If  all  the  coefficients  of  G(z)  are  either  0 or  1,  and  if 
all  the  coefficients  of  G (z)2  are  less  than  some  constant  M,  then  infinitely 
many  of  the  coefficients  of  G(z)2  are  zero. 
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Discrete  Probability 


THE  ELEMENT  OF  CHANCE  enters  into  many  of  our  attempts  to  under- 
stand the  world  we  live  in.  A mathematical  theory  of  probability  allows  us 
to  calculate  the  likelihood  of  complex  events  if  we  assume  that  the  events  are 
governed  by  appropriate  axioms.  This  theory  has  significant  applications  in 
all  branches  of  science,  and  it  has  strong  connections  with  the  techniques  we 
have  studied  in  previous  chapters. 

Probabilities  are  called  “discrete”  if  we  can  compute  the  probabilities  of 
all  events  by  summation  instead  of  by  integration.  We  are  getting  pretty  good 
at  sums,  so  it  should  come  as  no  great  surprise  that  we  are  ready  to  apply 
our  knowledge  to  some  interesting  calculations  of  probabilities  and  averages. 


8.1  DEFINITIONS 


(Readers  unfamiliar 
with  probability 
theory  will,  with 
high  "probability, 
benefit  from  a 
perusal  of  Feller's 
classic  introduction 
to  the  subject  [96].) 


Probability  theory  starts  with  the  idea  of  a probability  space,  which 
is  a set  D of  all  things  that  can  happen  in  a given  problem  together  with  a 
rule  that  assigns  a probability  Pr(cu)  to  each  elementary  event  cu  E fl.  The 
probability  Pr(u>)  must  be  a nonnegative  real  number,  and  the  condition 

Y Pr(cu)  = 1 (8.i) 

cugn 


must  hold  in  every  discrete  probability  space.  Thus,  each  value  Pr(co)  must  lie 
in  the  interval  [0  . . 1],  We  speak  of  Pr  as  a probability  distribution,  because 
it  distributes  a total  probability  of  1 among  the  events  o>. 

Here’s  an  example:  If  we’re  rolling  a pair  of  dice,  the  set  Q of  elementary 
events  is  d2={Q  E],  Q D LJ  a),  where 


Never  say  die. 


D = {0-  0,  0-  0.  0.  0} 


is  the  set  of  all  six  ways  that  a given  die  can  land.  Two  rolls  such  as  LJ  u 
and  □ n are  considered  to  be  distinct;  hence  this  probability  space  has  a 
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Careful:  They 
might  go  off. 

pri(Q]  = PntO)  = 1; 

pn®  = Pn(0)  = Pri(O)  = Pr iM  - l 

Then  LdGDPri  (d)  = 1,  so  Pn  is  a probability  distribution  on  the  set  D,  and 
we  can  assign  probabilities  to  the  elements  of  D = D2  by  the  rule 

Pm(dd')  = Pn(d)  Pri(d').  (8.2) 

For  example,  Pm  ( □ m)  = ^ | = j2 . This  is  a valid  distribution  because 

Prn(cu)  = Prn(dd')  = Y.  Pri(d)  Pri(d') 

u>€ri  dd'GD2  d,d'£D 

= £pri(d)  H pri(d')  =1-1  = 1. 

dGD  d'GD 


total  of  62  = 36  elements. 

We  usually  assume  that  dice  are  “fair,”  namely  that  each  of  the  six  possi- 
bilities for  a particular  die  has  probability  1,  and  that  each  of  the  36  possible 
rolls  in  Cl  has  probability  A.  But  we  can  also  consider  “loaded”  dice  in  which 
there  is  a different  distribution  of  probabilities.  For  example,  let 


We  can  also  consider  the  case  of  one  fair  die  and  one  loaded  die, 


Proi(dd')  = Pro(d)  Pr^d'),  where  Pro(d)  = (8.3) 


in  which  case  Proi  ( O m ) = | • | = jg  ■ Dice  in  the  “real  world”  can’t  really 
be  expected  to  turn  up  equally  often  on  each  side,  because  there  is  not  perfect 
symmetry;  but  | is  usually  pretty  close  to  the  truth. 

An  event  is  a subset  of  Q.  In  dice  games,  for  example,  the  set 


If  all  sides  of  a cube 
were  identical,  how 
could  we  tell  which 
side  is  face  up? 


IZM  OO.  EM  EM 


is  the  event  that  “doubles  are  thrown!’  The  individual  elements  cu  of  Cl  are 
called  elementary  events  because  they  cannot  be  decomposed  into  smaller 
subsets;  we  can  think  of  cu  as  a one-element  event  {w}. 

The  probability  of  an  event  A is  defined  by  the  formula 

Pr(cu  € A)  = L Pr(cu);  (8.4) 

weA 

and  in  general  if  R(o)  is  any  statement  about  cu,  we  write  ‘Pr(R(co))’  for  the 
sum  of  all  Pr(cu)  such  that  R(w)  is  true.  Thus,  for  example,  the  probability  of 
doubles  with  fair  dice  isjg  + jg  + 3^+3^+3^  + 3£  = gi  but  when  both  dice  are 
loaded  with  probability  distribution  P^  it  is  tV)+  -h,  + +pr  +rp-  = ^ 
Loading  the  dice  makes  the  event  “doubles  are  thrown”  more  probable. 
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(We  have  been  using  x-notation  in  a more  general  sense  here  than  de- 
fined in  Chapter  2:  The  sums  in  (8.1)  and  (8.4)  occur  over  all  elements  cu 
of  an  arbitrary  set,  not  over  integers  only.  However,  this  new  development  is 
not  really  alarming;  we  can  agree  to  use  special  notation  under  a ^ whenever 
nonintegers  are  intended,  so  there  will  be  no  confusion  with  our  ordinary  con- 
ventions. The  other  definitions  in  Chapter  2 are  still  valid;  in  particular,  the 
definition  of  infinite  sums  in  that  chapter  gives  the  appropriate  interpretation 
to  our  sums  when  the  set  Q is  infinite.  Each  probability  is  nonnegative,  and 
the  sum  of  all  probabilities  is  bounded,  so  the  probability  of  event  A in  (8.4) 
is  well  defined  for  all  subsets  A CD.) 

A random  variable  is  a function  defined  on  the  elementary  events  w of  a 
probability  space.  For  example,  if  Cl  = D2  we  can  define  S(w)  to  be  the  sum 
of  the  spots  on  the  dice  roll  w,  so  that  S(  □ m)  = 6 + 3 = 9.  The  probability 
that  the  spots  total  seven  is  the  probability  of  the  event  S(w)  = 7,  namely 

Pr(00)  + Pr(S^)  + Pr(0[[3) 

+ PrtldlZEU  Pr([EO+Pr(O0) 

With  fair  dice  (Pr  = Proo),  this  happens  with  probability  i;  with  loaded  dice 
(Pr  = Pr,,  ),  it  happens  with  probability  + gy  + p + gy  + gy  + yg  = IT’ 
the  same  as  we  observed  for  doubles. 

It’s  customary  to  drop  the  ‘(w)’  when  we  talk  about  random  variables, 
because  there’s  usually  only  one  probability  space  involved  when  we’re  work- 
ing on  any  particular  problem.  Thus  we  say  simply  ‘S  = 7’  for  the  event  that 
a 7 was  rolled,  and  ‘S  = 4’  for  the  event  { dl  m,  Q m,  U m }. 

A random  varialble  can  be  characterized  by  the  probability  distribution  of 
its  values.  Thus,  for  example,  S takes  on  eleven  possible  values  {2,3,  . . . ,12}, 
and  we  can  tabulate  the  probability  that  S = s for  each  s in  this  set: 


s 

2 

3 

4 

5 

6 

7 

8 

9 10 

11 

12 

Pr00[S  = s] 

i 

36 

2 

36 

3 

36 

4 

36 

5 

36 

6 

36 

5 

36 

4 i 

36  36 

2 

36 

36 

Pr,,[S  = s] 

4 

64 

4 

64 

5 

64 

6 

64 

7 

64 

12 

64 

_L 

64 

$ i_ 

64 

J_ 

64 

4 

ET 

If  we’re  working  on  a.  problem  that  involves  only  the  random  variable  S and  no 
other  properties  of  dice,  we  can  compute  the  answer  from  these  probabilities 
alone,  without  regard  to  the  details  of  the  set  D = D2.  In  fact,  we  could 
define  the  probability  space  to  be  the  smaller  set  D=  {2,3,.  • • ,12},  with 
whatever  probability  distribution  Pr(s)  is  desired.  Then  ‘S  = 4’  would  be 
an  elementary  event.  Thus  we  can  often  ignore  the  underlying  probability 
space  Q and  work  directly  with  random  variables  and  their  distributions. 

If  two  random  variables  X and  Y are  defined  over  the  same  probabil- 
ity space  Q,  we  can  characterize  their  behavior  without  knowing  everything 
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about  (1  if  we  know  the  “joint  distribution” 

Pr(X=x  and  Y = y ] 

for  each  x in  the  range  of  X and  each  y in  the  range  of  Y.  We  say  that  X and 
Y are  independent  random  variables  if 

Pr(X=x  and  Y=y)  = Pr(X  = x)-  Pr(Y  = y)  (8.5) 

for  all  x and  y.  Intuitively,  this  means  that  the  value  of  X has  no  effect  on 
the  value  of  Y. 

For  example,  if  Cl  is  the  set  of  dice  rolls  D2,  we  can  let  Si  be  the  number 
of  spots  on  the  first  die  and  S2  the  number  of  spots  on  the  second.  Then 
the  random  variables  Si  and  S2  are  independent  with  respect  to  each  of  the 
probability  distributions  Proo,  Pr11  . and  Pro;  discussed  earlier,  because  we 
defined  the  dice  probability  for  each  elementary  event  dd’  as  a product  of  a 
probability  for  Si  = d multiplied  by  a probability  for  S2  = d\  We  could  have 
defined  probabilities  differently  so  that,  say, 

Pr(HEl)  / Prl00l  * Pr(OB)  / PlO0>; 

but  we  didn’t  do  that,  because  different  dice  aren’t  supposed  to  influence  each 
other.  With  our  definitions,  both  of  these  ratios  are  Pr(S2  =5)/  Pr ( S2  = 6) . 

We  have  defined  S to  be  the  sum  of  the  two  spot  values,  Si  + S2.  Let’s 
consider  another  random  variable  P,  the  product  S 1 S2 ■ Ate  S and  P indepen- 
dent? Informally,  no;  if  we  are  told  that  S = 2,  we  know  that  P must  be  1. 
Formally,  no  again,  because  the  independence  condition  (8.5)  fails  spectac- 
ularly (at  least  in  the  case  of  fair  dice):  For  all  legal  values  of  s and  p,  we 
have  0 < Proo[S  = s] -ProofP  =p]  ^ \ ■ this  can’t  equal  Proo[S  = s and  P =p], 
which  is  a multiple  of  jg. 

If  we  want  to  understand  the  typical  behavior  of  a given  random  vari- 
able, we  often  ask  about  its  “average”  value.  But  the  notion  of  “average” 
is  ambiguous;  people  generally  speak  about  three  different  kinds  of  averages 
when  a sequence  of  numbers  is  given: 

the  mean  (which  is  the.  sum  of  all  values,  divided  by  the  number  of 

values); 

the  median  (which  is  the  middle  value,  numerically); 

the  mode  (which  is  the  value  that  occurs  most  often). 

For  example,  the  mean  of  (3, 1,4, 1,5)  is  i±i±i±i±i  = 2.8;  the  median  is  3; 
the  mode  is  1 . 

But  probability  theorists  usually  work  with  random  variables  instead  of 
with  sequences  of  numbers,  so  we  want  to  define  the  notion  of  an  “average”  for 
random  variables  too.  Suppose  we  repeat  an  experiment  over  and  over  again. 


Just  Say  No. 


A dicey  inequality. 
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making  independent  trials  in  such  a way  that  each  value  of  X occurs  with 
a frequency  approximately  proportional  to  its  probability.  (For  example,  we 
might  roll  a pair  of  dice  many  times,  observing  the  values  of  S and/or  P.)  We’d 
like  to  define  the  average  value  of  a random  variable  so  that  such  experiments 
will  usually  produce  a sequence  of  numbers  whose  mean,  median,  or  mode  is 
approximately  the  same  as  the  mean,  median,  or  mode  of  X,  according  to  our 
definitions. 

Here’s  how  it  can  be  done:  The  mean  of  a random  real- valued  variable  X 
on  a probability  space  Q is  defined  to  be 

22  x-Pr(X=:x)  (8.6) 

xgX(n) 

if  this  potentially  infinite  sum  exists.  (Here  X(n)  stands  for  the  set  of  all 

values  that  X can  assume.)  The  median  of  X is  defined  to  be  the  set  of  all  x 

such  that 

Pr(X^x)  3 j a n d Pr(X^x)  ^ 1.  (8.7) 

And  the  mode  of  X is  defined  to  be  the  set  of  all  x such  that 

Pr(X=x)  ^ Pr(X=x’ ) for  all  x’  e X(n).  (8.8) 

In  our  dice-throwing  example,  the  mean  of  S turns  out  to  be  2 ■ yg  + 3 • 
3g  + --  -+12'3g  = 7in  distribution  Proo , and  it  also  turns  out  to  be  7 in 
distribution  Pr]  p The  median  and  mode  both  turn  out  to  be  {7}  as  well, 
in  both  distributions.  So  S has  the  same  average  under  all  three  definitions. 
On  the  other  hand  the  P in  distribution  Proo  turns  out  to  have  a mean  value 
of  j-  = 12.25;  its  median  is  {10},  and  its  mode  is  {6, 12}.  The  mean  of  P is 
unchanged  if  we  load  the  dice  with  distribution  Prp  , but  the  median  drops 
to  {8}  and  the  mode  becomes  {6}  alone. 

Probability  theorists  have  a special  name  and  notation  for  the  mean  of  a 
random  variable:  They  call  it  the  expected  value,  and  write 

EX  = Y X(w)  Pr(cu) . (8.9) 

uiGQ 

In  our  dice-throwing  example,  this  sum  has  36  terms  (one  for  each  element 
of  O),  while  (8.6)  is  a sum  of  only  eleven  terms.  But  both  sums  have  the 
same  value,  because  they’re  both  equal  to 

2~  x Pr(cu)  [x  = X(o>)] 

wen 

xGX(n) 
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The  mean  of  a random  variable  turns  out  to  be  more  meaningful  in 
applications  than  the  other  kinds  of  averages,  so  we  shall  largely  forget  about 
medians  and  modes  from  now  on.  We  will  use  the  terms  “expected  value,” 
“mean,”  and  “average”  almost  interchangeably  in  the  rest  of  this  chapter. 

If  X and  Y are  any  two  random  variables  defined  on  the  same  probability 
space,  then  X T Y is  also  a random  variable  on  that  space.  By  formula  (8.9), 
the  average  of  their  sum  is  the  sum  of  their  averages: 


/ get  it: 

On  average,  “aver- 
age” means  “mean.” 


E(X+Y)  = Y_  (X(w)  +Y(cu))  Pr(cu)  = EX+  EY.  (8.10) 

even 


Similarly,  if  ct  is  any  constant  we  have  the  simple  rule 


E(cxX)  = cxEX. 


(8.11) 


But  the  corresponding  rule  for  multiplication  of  random  variables  is  more 
complicated  in  general;  the  expected  value  is  defined  as  a sum  over  elementary 
events,  and  sums  of  products  don’t  often  have  a simple  form.  In  spite  of  this 
difficulty,  there  is  a very  nice  formula  for  the  mean  of  a product  in  the  special 
case  that  the  random  variables  are  independent: 

E(XY)  = (EX)(EY),  if  X and  Y are  independent.  (8.12) 

We  can  prove  this  by  the  distributive  law  for  products, 

E(XY)  = X(cu)Y(cu)  -Pr(co) 

tuen 

= ^ xy-Pr(X  = x and  Y=y) 

x6X(Q) 

yeY(Q) 

= xy -Pr(X  = x)  Pr(Y=y) 

x<EX(n) 

V)€Y(0) 

= [xPr(X  = x)  . ^yPr(Y  = y)  = (EX)(EY). 
xex(O)  YeY(n) 

For  example,  we  know  that  S = Si  +S2  and  P = Si  S2,  when  Si  and  S2  are 
the  numbers  of  spots  on  the  first  and  second  of  a pair  of  random  dice.  We  have 
ES]  = ES2  = j,  hence  ES  = 7;  furthermore  Si  and  S2  are  independent,  so 
EP  = j-j  = as  claimedearlier.  We  also  have  E(S+P)  = ES+EP  = 7+^. 
But  S and  P are  not  independent,  so  we  cannot  assert  that  E(SP)  = 7-^= 

In  fact,  the  expected  value  of  SP  turns  out  to  equal  jn  distribution  Proo, 
112  (exactly)  in  distribution  Pin  . 
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(Slightly  subtle 
poi  nt: 

There  are  two 
probability  spaces, 
depending  on  what 
strategy  we  use;  but 
EXi  and  EX2  are 
the  same  in  both.) 


8.2  MEAN  AND  VARIANCE 

The  next  most  important  property  of  a random  variable,  after  we 
know  its  expected  value,  is  its  variance,  defined  as  the  mean  square  deviation 
from  the  mean: 

VX  = E((X  - EX)2)  . (8.13) 

If  we  denote  EX  by  p,  the  variance  VX  is  the  expected  value  of  (X-  p)2.  This 
measures  the  “spread”  of  X’s  distribution. 

As  a simple  example  of  variance  computation,  let’s  suppose  we  have  just 
been  made  an  offer  we  can’t  refuse:  Someone  has  given  us  two  gift  certificates 
for  a certain  lottery.  The  lottery  organizers  sell  100  tickets  for  each  weekly 
drawing.  One  of  these  tickets  is  selected  by  a uniformly  random  process  — 
that  is,  each  ticket  is  equally  likely  to  be  chosen-and  the  lucky  ticket  holder 
wins  a hundred  million  dollars.  The  other  99  ticket  holders  win  nothing. 

We  can  use  our  gift  in  two  ways:  Either  we  buy  two  tickets  in  the  same 
lottery,  or  we  buy  one  ticket  in  each  of  two  lotteries.  Which  is  a better 
strategy?  Let’s  try  to  analyze  this  by  letting  Xi  and  X2  be  random  variables 
that  represent  the  amount  we  win  on  our  first  and  second  ticket.  The  expected 
value  of  Xi , in  millions,  is 

EXi=  t^-OH-^-100  = 1, 

and  the  same  holds  for  X2.  Expected  values  are  additive,  so  our  average  total 
winnings  will  be 

E(Xi  + X2)  = EXi  + EX2  = 2 million  dollars, 

regardless  of  which  strategy  we  adopt 

Still,  the  two  strategies  seem  different.  Let’s  look  beyond  expected  values 
and  study  the  exact  probability  distribution  of  X]  + X2: 


winnings  (millions) 
0 100  200 

same  drawing 
different  drawings 

,9800  -0200 
.9801  .0198  .0001 

If  we  buy  two  tickets  in  the  same  lottery  we  have  a 98%  chance  of  winning 
nothing  and  a 2%  chance  of  winning  $100  million.  If  we  buy  them  in  different 
lotteries  we  have  a 98.01%  chance  of  winning  nothing,  so  this  is  slightly  more 
likely  than  before;  and  we  have  a 0.01%  chance  of  winning  $200  million,  also 
slightly  more  likely  than  before;  and  our  chances  of  winning  $100  million  are 
now  1.98%.  So  the  distribution  of  Xi  + X2  in  this  second  situation  is  slightly 
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more  spread  out;  the  middle  value,  $100  million,  is  slightly  less  likely,  but  the 
extreme  values  are  slightly  more  likely. 

It’s  this  notion  of  the  spread  of  a random  variable  that  the  variance  is 
intended  to  capture.  We  measure  the  spread  in  terms  of  the  squared  deviation 
of  the  random  variable  from  its  mean.  In  case  1,  the  variance  is  therefore 

,98(0M  2M)2  + -02(  100M  2M)2  = 196M2  ; 

in  case  2 it  is 

.9801  (OM  - 2M)2  + ,0198(  l OOM  - 2M)2  + .0001  (200M  - 2M)2 

= 198M2. 

As  we  expected,  the  latter  variance  is  slightly  larger,  because  the  distribution 
of  case  2 is  slightly  more  spread  out. 

When  we  work  with  variances,  everything  is  squared,  so  the  numbers  can 
get  pretty  big.  (The  factor  M2  is  one  trillion,  which  is  somewhat  imposing 
even  for  high-stakes  gamblers.)  To  convert  the  numbers  back  to  the  more 
meaningful  original  scale,  we  often  take  the  square  root  of  the  variance.  The 
resulting  number  is  called  the  standard  deviation,  and  it  is  usually  denoted 
by  the  Greek  letter  01 

a = \fVX.  (8.14) 

The  standard  deviations  of  the  random  variables  X’  -)-  X2  in  our  two  lottery 
strategies  are  v/196M2  = 14. OOM  and  V198M2  * 1 4.071 247M.  In  some  sense 
the  second  alternative  is  about  $71,247  riskier. 

How  does  the  variance  help  us  choose  a strategy?  It’s  not  clear.  The 
strategy  with  higher  variance  is  a little  riskier;  but  do  we  get  the  most  for  our 
money  by  taking  more  risks  or  by  playing  it  safe?  Suppose  we  had  the  chance 
to  buy  100  tickets  instead  of  only  two.  Then  we  could  have  a guaranteed 
victory  in  a single  lottery  (and  the  variance  would  be  zero);  or  we  could 
gamble  on  a hundred  different  lotteries,  with  a ,99’ 00  ~ .366  chance  of  winning 
nothing  but  also  with  a nonzero  probability  of  winning  up  to  $10,000,000,000. 
To  decide  between  these  alternatives  is  beyond  the  scope  of  this  book;  all  we 
can  do  here  is  explain  how  to  do  the  calculations. 

In  fact,  there  is  a simpler  way  to  calculate  the  variance,  instead  of  using 
the  definition  (8.13).  (We  suspect  that  there  must  be  something  going  on 
in  the  mathematics  behind  the  scenes,  because  the  variances  in  the  lottery 
example  magically  came  out  to  be  integer  multiples  of  M2.)  We  have 

E((X  - EX)2)  = E(X2  - 2X(EX)  + (EX)2) 

= E(X2 ) - 2(EX)(EX)  + (EX)2  , 


Interesting:  The 

variance  of  a dollar 
amount  is  expressed 
in  units  of  square 
dollars. 


Another  way  to 
reduce  risk  might 
be  to  bribe  the 
lottery  officials. 

I guess  that’s  where 
probability  becomes 
indiscreet. 

(N.B.:  Opinions 
expressed  in  these 
margins  do  not 
necessarily  represent 
the  opinions  of  the 
management.) 


since  (EX)  is  a constant;  hence 
VX  = E(X2)  - (EX)2. 
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(8-15) 

“The  variance  is  the  mean  of  the  square  minus  the  square  of  the  mean.” 

For  example,  the  mean  of  (X,  + X2  )2  comes  to  ,98(0M)2  + .02(1 00M)2  = 
200M2  or  to  .9801  (0M)2  + .0198(  100M)2  + .0001  (200M)2  = 202M2  in  the 
lottery  problem.  Subtracting  4M2  (the  square  of  the  mean)  gives  the  results 
we  obtained  the  hard  way. 

There’s  an  even  easier  formula  yet,  if  we  want  to  calculate  V(X+  Y)  when 
X and  Y are  independent:  We  have 

E((X  + Y)2)  = E(X2  + 2XY  + Y2) 

= E(X2)  + 2(EX)(EY)  -f  E(Y2) , 

since  we  know  that  E(XY)  = (EX)  (EY)  in  the  independent  case.  Therefore 

V(X  + Y)  = E((X  + Y)2)  (EX+EY)2 
= E ( X2 ) + 2(EX)(EY)  + E(  Y2 ) 

— (EX)2  - 2(EX)(EY)  - (EY)2 
= E(X2)  — (EX)2  + E(Y2)-(EY)2 

= VX  + VY.  (8.16) 

“The  variance  of  a sum  of  independent  random  variables  is  the  sum  of  their 
variances.”  For  example,  the  variance  of  the  amount  we  can  win  with  a single 
lottery  ticket  is 

E(X2)  - (EX,  )2  = ,99(0M)2  + .01  (100M)2  - (1  M)2  = 99M2  ■ 

Therefore  the  variance  of  the  total  winnings  of  two  lottery  tickets  in  two 
separate  (independent)  lotteries  is  2x  99M2  = 198M2.  And  the  corresponding 
variance  for  n independent  lottery  tickets  is  n x 99M2. 

The  variance  of  the  dice-roll  sum  S drops  out  of  this  same  formula,  since 
S = S,  + S2  is  the  sum  of  two  independent  random  variables.  We  have 

2 

VSi  = ^ ( 1 2 + 22  + 32  + 42  + 52  + 62 ) — l - 31 

6 0 2 “ 12 

when  the  dice  are  fair;  hence  VS  = M -f  M = M The  loaded  die  has 

12  12  6 


45 

12’ 


2 
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hence  VS  = ^ = 7.5  when  both  dice  are  loaded.  Notice  that  the  loaded  dice 
give  S a larger  variance,  although  S actually  assumes  its  average  value  7 more 
often  than  it  would  with  fair  dice.  If  our  goal  is  to  shoot  lots  of  lucky  7’s,  the 
variance  is  not  our  best  indicator  of  success. 

OK,  we  have  learned  how  to  compute  variances.  But  we  haven’t  really 
seen  a good  reason  why  the  variance  is  a natural  thing  to  compute.  Everybody 
does  it,  but  why?  The  main  reason  is  Chebyshew's  inequality  ([24']  and 
[50,]))  which  states  that  the  variance  has  a significant  property: 


If  he  proved  it  in 

1867,  it’s  a dassic 
‘67  Chebyshev. 


Pr ((X  - EX)2  ^ ct)  $ VX/a , for  all  a > 0. 


(8.17) 


(This  is  different  from  the  summation  inequalities  of  Chebyshev  that  we  en- 
countered in  Chapter  2.)  Very  roughly,  (8.17)  tells  us  that  a random  variable  X 
will  rarely  be  far  from  its  mean  EX  if  its  variance  VX  is  small.  The  proof  is 
amazingly  simple.  We  have 


vx  = Y_  (x(w)  - EX)2  Pr(w) 

u>en 

^ Y_  (X(  w)  -EX)2  Pr(cu) 

wen 

(X(tu)-E  X)2^ct 

3 V «Pr(cn)  = a-Pr((X  - EX)2  a) ; 

wen 

(X(cu)-EX)2^a 

dividing  by  a finishes  the  proof. 

If  we  write  pi  for  the  mean  and  CT  for  the  standard  deviation,  and  if  we 
replace  oc  by  c2VX  in  (8.17),  the  condition  (X  — EX)2  ^ C2VX  is  the  same  as 
(X  — pi)2  7>  (cc)2;  hence  (8.17)  says  that 

Pr(|X -pil^co)  ^1/c2.  (8.18) 

Thus,  X will  lie  within  c standard  deviations  of  its  mean  value  except  with 
probability  at  most  1/c2.  A random  variable  will  lie  within  2d  of  pi  at  least 
75%  of  the  time;  it  will  lie  between  pi  — 100  and  pi  + 100  at  least  99%  of  the 
time.  These  are  the  cases  a — 4VX  and  a = 100VX  of  Chebyshev’s  inequality. 

If  we  roll  a pair  of  fair  dice  n times,  the  total  value  of  the  n rolls  will 
almost  always  be  near  7n,  for  large  n.  Here’s  why:  The  variance  of  n in- 
dependent rolls  is  ^n.  A variance  of  means  a standard  deviation  of 
only 
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So  Chebyshev’s  inequality  tells  us  that  the  final  sum  will  lie  between 
7n  — 10^/^n  and  7n  + 10y^n 

in  at  least  99%  of  all  experiments  when  n fair  dice  are  rolled.  For  example, 
the  odds  are  better  than  99  to  1 that  the  total  value  of  a million  rolls  will  be 
between  6.976  million  and  7.024  million. 

In  general,  let  X be  any  random  variable  over  a probability  space  £2,  hav- 
ing finite  mean  p.  and  finite  standard  deviation  o',  Then  we  can  consider  the 
probability  space  £2n  whose  elementary  events  are  n-tuples  (cui , CU2 , . . . , u)n) 
with  each  oik  £ Cl,  and  whose  probabilities  are 

Pr(cu, , 0)2, ...  , cun)  = Pr(tu, ) Pr(o>2) . . . Pr(cun)  . 

If  we  now  define  random  variables  Xk  by  the  formula 

Xic(o)i,a)2, . . . ,o)n)  = X(aik) , 
the  quantity 

Xi  + X2  + • . . + Xn 

is  a sum  of  n independent  random  variables,  which  corresponds  to  taking  n 
independent  “samples”  of  X on  D and  adding  them  together.  The  mean  of 
Xi  +X2  + - --|-Xn  is  r.p,  and  the  standard  deviation  is  ^/n  o;  hence  the  average 
of  the  n samples, 

— (Xi  + X2  4 — + Xn) , 

will  he  between  p lOd/y^n  and  | x + lOo/y/n  at  least  99%  of  the  time.  In 
other  words,  if  we  choose  a large  enough  value  of  n,  the  average  of  n inde- 
pendent samples  will  almost  always  be  very  near  the  expected  value  EX.  (An 
even  stronger  theorem  called  the  Strong  Law  of  Large  Numbers  is  proved  in 
textbooks  of  probability  theory;  but  the  simple  consequence  of  Chebyshev’s 
inequality  that  we  have  just  derived  is  enough  for  our  purposes.) 

Sometimes  we  don’t  know  the  characteristics  of  a probability  space,  and 
we  want  to  estimate  the  mean  of  a random  variable  X by  sampling  its  value 
repeatedly.  (For  example,  we  might  want  to  know  the  average  temperature 
at  noon  on  a January  day  in  San  Francisco;  or  we  may  wish  to  know  the 
mean  life  expectancy  of  insurance  agents.)  If  we  have  obtained  independent 
empirical  observations  X;,  X2,  ....  Xn,  we  can  guess  that  the  true  mean  is 
approximately 

_ Xi  + X2  + • • • + Xn 

n 


(8-19) 


(That  is,  the  aver- 
age will  fall  between 
tlie  stated  limits  in 
at  least  99%  of  all 
cases  when  we  look 
at  a set  of  n inde- 
pendent samples, 
for  any  fixed  value 
of  n Don’t  mis- 
understand this  as 
a statement  about 
the  averages  of  an 
infinite  sequence 

Xi  , X 2 , X 3 , . 

as  n varies.) 
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And  we  can  also  make  an  estimate  of  the  variance,  using  the  formula 


VX 


Xj  + Xi  + + X;  ( X i -+  X2  + 1 ' + Xn)~ 

n - 1 " n(n  1 


(8.20) 


The  (n  — 1)  ’s  in  this  formula  look  like  typographic  errors;  it  seems  they  should 
be  n’s,  as  in  (8.19),  because  the  true  variance  VX  is  defined  by  expected  values 
in  (8.15).  Yet  we  get  a better  estimate  with  n 1 instead  of  n here,  because 
definition  (8.20)  implies  that 

E(VX)  = VX  . (8.21) 

Here’s  why: 


E(VX) 


n 


it  - u ri 

E (LXT  n^LX‘X 


k=l 


1 

n-  1 


LE(X^> 


k=1 


1 n n 

kX  H E(X,Xk)) 

1=1  k=1 


= Y y (E(X)2[iVk]+  E(X2 ) f)  = k|)) 

k=l  j=l  k=l 

= ^(nE(X2)-i(uE(X2l+n(n-  l)E(X)2)] 

- E(X2)  — E(X)2  = VX 


(This  derivation  uses  the  independence  of  the  observations  when  it  replaces 
E(XjX|J  by  (EX)2[j^k]+  E(X2)[j  = k].) 

In  practice,  experimental  results  about  a random  variable  X are  usually 
obtained  by  calculating  a sample  mean  ft  = EX  and  a sample  standard  de- 
viation 0 = ytfk,  and  presenting  the  answer  in  the  form  1 ft  ± (j/y/n  For 
example,  here  are  ten  rolls  of  two  supposedly  fair  dice: 


HQ 


The  sample  mean  of  the  spot  sum  S is 

ft  = (7+11  + 8 + 5 + 4 + 6+10  + 8 + 8 + 7)/10  -7.4; 

the  sample  variance  is 


(72  + ll2  + 82  + 52  +42  + 62  + 102  + 82  + 82  + 72  - 10ft2)/9  % 2.12 
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Not  to  be  confused 
with  a Fibonacci 
number. 


One  the  average. 


We  estimate  the  average  spot  sum  of  these  dice  to  be  7.4±2.1  /\A 0 = 7,4±0.7, 
on  the  basis  of  these  experiments. 

Let’s  work  one  more  example  of  means  and  variances,  in  order  to  show 
how  they  can  be  calculated  theoretically  instead  of  empirically.  One  of  the 
questions  we  considered  in  Chapter  5 was  the  “football  victory  problem,’ 
where  n hats  are  thrown  into  the  air  and  the  result  is  a random  permutation 
of  hats.  We  showed  in  equation  (5.51)  that  there’s  a probability  of  n j/n!  « 1 /e 
that  nobody  gets  the  right  hat  back.  We  also  derived  the  formula 


p,n-k|  = ;!>(k)(n-k,i 


1 (n-k)j 
k!  (n-k)! 


(8.22) 


for  the  probability  that  exactly  k people  end  up  with  their  own  hats. 

Restating  these  results  in  the  formalism  just  learned,  we  can  consider  the 
probability  space  fln  of  all  n!  permutations  71  of  {1,2, . . . , n),  where  Pr(7i)  = 

1 /n!  for  all  7t  £ Fln.  The  random  variable 


Fn(7T)  = number  of  “fixed  points’’  of  7t  , for  % € rin, 


measures  the  number  of  correct  hat-falls  in  the  football  victory  problem. 
Equation  (8.22)  gives  Pr(Fn  = k),  but  let’s  pretend  that  we  don’t  know  any 
such  formula;  we  merely  want  to  study  the  average  value  of  Fn,  and  its  stan- 
dard deviation. 

The  average  value  is,  in  fact,  extremely  easy  to  calculate,  avoiding  all  the 
complexities  of  Chapter  5.  We  simply  observe  that 

Fn(7t)  = Frt,,  (7t)  + Fn,2(7r)+  + Fn,n(7t) , 

Fn  j.(7t)  = [position  k of  71  is  a fixed  point]  , for  71  £ TTn. 


Hence 


EF,  = EFnj  + EFn,2  + . . . + EFn,n 

And  the  expected  value  of  Fn  ^ is  simply  the  probability  that  Fn  ^ = 1 , which 
is  1/n  because  exactly  (n  — 1)!  of  the  n!  permutations  71  = 7Ci 7t2  • • • 7Ttl  £ FIn 
have  7Tj;  = k.  Therefore 

EF,  = n/n  =;  1 , for  n > 0.  (8-23) 

On  the  average,  one  hat  will  be  in  its  correct  place.  “A  random  permutation 
has  one  fixed  point,  on  the  average.” 

Now  what’s  the  standard  deviation?  This  question  is  more  difficult,  be- 
cause the  Fnk’s  are  not  independent  of  each  other.  But  we  can  calculate  the 
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variance  by  analyzing  the  mutual  dependencies  among  them: 


E(F2J  = e(  ( £ Fn.j)  - e(  £_  f Fn,j  Fnlc) 

' V k=11  ' ] = \ k=1 

n ti 

= y Y E(FnjFn,ic)  = y_  E(F2ik)+2  y_  E(Fni,  Fn,k) 

j = lk=l  l^k$n  l^j<k^n 

(We  used  a similar  trick  when  we  derived  (2.33)  in  Chapter  2.)  Now  F2  k 
Fn,k,  since  Ftv^  is  either  0 or  1;  hence  E(F2  k)  = EFni<  = 1/n  as  before.  And 
if  j < k we  have  E(Fnj  Fn  jJ  = Pr(7T  has  both  j and  k as  fixed  points)  — 
(n  — 2)!/n!  = l/n(n  — 1).  Therefore 


e(f2)  = l 


+ 


n 


()2  n ( n - 1 ) 


- 2, 


for  n > 2. 


(8.24) 


(As  a check  when  n = 3,  we  have  |o2  + |l2  + %’]}  + I32  = 2.)  The  variance 

DODO 

is  E(F2)  — (EFn)2  = 1,  so  the  standard  deviation  (like  the  mean)  is  1.  “A 
random  permutation  of  n ^ 2 elements  has  1 ± 1 fixed  points.” 


8.3  PROBABILITY  GENERATING  FUNCTIONS 

If  X is  a random  variable  that  takes  only  nonnegative  integer  values, 
we  can  capture  its  probability  distribution  nicely  by  using  the  techniques  of 
Chapter  7.  The  probability  generating  function  or  pgf  of  X is 

Gx(z)  = y_  Pr(X  = k)  zk  . (8.25) 

k^O 

This  power  series  in  z contains  all  the  information  about  the  random  vari- 
able X.  We  can  also  express  it  in  two  other  ways: 

Gx(z)  = V Pr(at)zx[u,)  = E(zx).  (8.26) 

tuen 

The  coefficients  of  Gx(z)  are  nonnegative,  and  they  sum  to  1;  the  latter 
condition  can  be  written 

GxO)  = 1.  (8.27) 

Conversely,  any  power  series  G(z)  with  nonnegative  coefficients  and  with 
G ( 1 ) =1  is  the  pgf  of  some  random  variable. 
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The  nicest  thing  about  pgf’ s is  that  they  usually  simplify  the  computation 
of  means  and  variances.  For  example,  the  mean  is  easily  expressed: 

EX  = ^k-Pr(X  = k) 

k^O 

= ^Pr(X  = k)-kzk-1|2=1 
k^O 

= Gx(l ) • (8-28) 

We  simply  differentiate  the  pgf  with  respect  to  z and  set  z = 1 . 

The  variance  is  only  slightly  more  complicated: 

E(X2)  = ^k2-Pr(X  = k) 
k^O 

= ^Pr(X  = k)-(k(k- l)zk-2+  kzk_1)  |z=1  = Gi'(l)+  G'(1). 

k^O 


Therefore 

VX  = G x { 1 ) + Gx ( 1 ) - Gx(l)2 . (8-29) 

Equations  (8.28)  and  (8.29)  tell  us  that  we  can  compute  the  mean  and  variance 
if  we  can  compute  the  values  of  two  derivatives,  G*  (1)  and  Gj[  (1).  We  don’t 
have  to  know  a closed  form  for  the  probabilities;  we  don’t  even  have  to  know 
a closed  form  for  Gx  (z)  itsdf. 

It  is  convenient’  to  write 

Mean(G)  = G 1 ( I ) , (8.30) 

Var(G)  = G " ( I ) + G1  (I  )■  G'(1)2,  (8.31) 

when  G is  any  function,  since  we  frequently  want  to  compute  these  combina- 
tions of  derivatives. 

The  second-nicest  thing  about  pgf’s  is  that  they  are  comparatively  sim- 
ple functions  of  z,  in  many  important  cases.  For  example,  let’s  look  at  the 
uniform  distribution  of  order  n,  in  which  the  random  variable  takes  on  each 
of  the  values  {0,  1,  ...  , n — 1}  with  probability  1/n.  The  pgf  in  this  case  is 

lin(z)  = — ( 1 + z + hzn_1)  = -- — —,:t  forn^l,  (8.32) 

n n 1 — z 

We  have  a closed  form  for  U,(z)  because  this  is  a geometric  series. 

But  this  closed  form  proves  to  be  somewhat  embarrassing:  When  we  plug 
in  z = 1 (the  value  of  z that’s  most  critical  for  the  pgf),  we  get  the  undefined 
ratio  O/O,  even  though  U,(z)  is  a polynomial  that  is  perfectly  well  defined 
at  any  value  of  z.  The  value  Un  ( 1 ) = 1 is  obvious  from  the  non-closed  form 
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(1  + z + • • • + zn_1)/n,  yet  it  seems  that  we  must  resort  to  L’Hospital’s  rule 
to  find  limz_i  U,(z)  if  we  want  to  determine  Un(  1)  from  the  closed  form. 
The  determination  of  U^(  1)  by  L’ Hospital’s  rule  will  be  even  harder,  because 
there  will  be  a factor  of  (z-  1 )2  in  the  denominator;  U”  (1)  will  be  harder  still. 

Luckily  there’s  a nice  way  out  of  this  dilemma.  If  G(z)  = ^n>0  9nZn  is 
any  power  series  that  converges  for  at  least  one  value  of  z with  \z  > 1,  the 
power  series  G’(z)  = ^n>0rtgnzn  1 W'H  also  have  this  property,  and  so  will 
G”(z),  G”‘(z),  etc.  There/fore  by  Taylor’s  theorem  we  can  write 


Gll+t)  = G„)  + ™t+qiW 


1! 


2! 


3! 


l,V  + . 


(8-33) 


all  derivatives  of  G(z)  at  z = 1 will  appear  as  coefficients,  when  G(  1 + t)  is 
expanded  in  powers  of  t. 

For  example,  the  derivatives  of  the  uniform  pgf  U,(z)  are  easily  found 
in  this  way: 


i + tr  - 1 

Un(l  +t)  =-  

n t 


lfn 

nil 


H — 

n 


t+i(")t2+ 


+i(n 


i.n-1 


Comparing  this  to  (8.33)  gives 


Und) 


i;  u;m  = 


n-  1 
2 


Und) 


(n  — 1 ) (n  — 2) 
3 


(8.34) 


and  in  general  lil"1'  (1)  = (n  — 1 )B./  (m  + 1 ),  although  we  need  only  the  cases 
m = 1 and  m = 2 to  compute  the  mean  and  the  variance.  The  mean  of  the 
uniform  distribution  is 

Uhl)  = ‘LTi,  (8,35) 

and  the  variance  is 


u"(d  + u;(1)-h;(1)2 


I ) ( n- 2) 

12 


+ 6 


(n  — 1 ) 

U 


n2  — 1 

12 


~3 


(n-1)2 

12 


(8-36) 


The  third-nicest  thing  about  pgf’s  is  that  the  product  of  pgf’s  corresponds 
to  the  sum  of  independent  random  variables.  We  learned  in  Chapters  5 and  7 
that  the  product  of  generating  functions  corresponds  to  the  convolution  of 
sequences;  but  it’s  even  more  important  in  applications  to  know  that  the 
convolution  of  probabilities  corresponds  to  the  sum  of  independent  random 
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I’ll  graduate  magna 

cum  ulant. 


variables.  Indeed,  if  X and  Y are  random  variables  that  take  on  nothing  but 
integer  values,  the  probability  that  X + Y = n is 

Pr(X  + Y = n)  J^Pr(X  = k and  Y = u — k) . 

k 

If  X and  Y are  independent,  we  now  have 

Pr(X  + Y = n)  _ Pr(X  = k)  Pr(Y  = n — k), 

k 

a convolution.  Therefore-and  this  is  the  punch  line  — 

Gx+y(z.)  = Gx(z)  Gy(z)  , if  X and  Y are  independent.  (8-37) 

Earlier  this  chapter  we  observed  that  V(  X + Y)  = VX  + VY  when  X and  Y are 
independent.  Let  F(z)  and  G(z)  be  the  pgf’ s for  X and  Y,  and  let  H(z)  be  the 
pgf  for  X + Y.  Then 

H:zj  = F(z)G(z) , 

and  our  formulas  (8.28)  through  (8.31)  for  mean  and  variance  tell  us  that  we 
must  have 

Mean(H)  = Mean(F)  + Mean(G)  ; (8.38) 

Var(H)  = Var(F)  +Var(G).  (8.39) 

These  formulas,  which  are  properties  of  the  derivatives  Mean(H)  = H’(  1)  and 
Var(H)  = H"(  1)  + H’(  1)  — H’(  1 I2,  aren’t  valid  for  arbitrary  function  products 
H(z)  = F(z)G(z);  we  have 

H’(z)  = F'(z)G(z)  + F(z)G'(z)  , 

H”(z)  = F"(z)G(z)  +2F,(z)G'(z)  + F(z)G"(z). 

But  if  we  set  Z — 1,  we  can  see  that  (8.38)  and  (8.39)  will  be  valid  in  general 
provided  only  that 

F(1)  = G(1)  = 1 (8.40) 

and  that  the  derivatives  exist.  The  “probabilities”  don’t  have  to  be  in  [0  1] 
for  these  formulas  to  hold.  We  can  normalize  the  functions  F(z)  and  G(z) 
by  dividing  through  by  F(  1 ) and  G (1)  in  order  to  make  this  condition  valid, 
whenever  F(  1)  and  G (1)  are  nonzero. 

Mean  and  variance  aren’t  the  whole  story.  They  are  merely  two  of  an 
infinite  series  of  so-called  cumulant  statistics  introduced  by  the  Danish  as- 
tronomer Thorvald  Nicolai  Thiele  [288]  in  1903.  The  first  two  cumulants 
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K]  and  <2  of  a random  variable  are  what  we  have  called  the  mean  and  the 
variance;  there  also  are  higher-order  cumulants  that  express  more  subtle  prop- 
erties of  a distribution.  The  general  formula 


lnG(et)=^t+^t2  + 


1! 


2! 


3!  4! 


(8.41) 


defines  the  cumulants  of  all  orders,  when  G(z)  is  the  pgf  of  a random  variable. 
Let’s  look  at  cumulants  more  closely.  If  G(z)  is  the  pgf  for  X,  we  have 


. _ vmtm 

G(el)  = T Pr(X  = k)ekt  = Y Pr(X  = k) 

k.TTl^O 

= 1 + TTt  + 2ft2  ’ (8-42) 

where 

Pm  = Y_  km  Pr(X  = k)  = E(Xm) . (8.43) 

k^O 

This  quantity  p.m  is  called  the  “mth  moment”  of  X.  We  can  take  exponentials 
on  both  sides  of  (8.41),  obtaining  another  formula  for  Gfe1): 

n,  t,  , (Klt+  jK2t2  + ---)  + (K,t+  l<2t2  + •••  )2  + 

G(el)=l+ ^ ^ ... 

= 1 + Klt+  j(k2  + Kf)t2  H . 

Equating  coefficients  of  powers  of  t leads  to  a series  of  formulas 


K1  = M-i  , (8-44) 

i<2  = V-i  -hf,  (8.45) 

k3  = -3p.m2  +2|4,  (8.46) 

k4  = pt4  -4p.ip.3  + 1 2|J.f M-2  -3p-2  _6lim  (8-47) 

*5  = — 5|Xi  pet  +20p2p3  1 0 pt2  M-3 

+ 30m  hi  “ 6°rii  M-2  + 24|Xi  , (8.48) 


defining  the  cumulants  in  terms  of  the  moments.  Notice  that  Ki  is  indeed  the 
variance,  E(X2)  — (EX)2,  as  claimed. 

Equation  (8.41)  makes  it  clear  that  the  cumulants  defined  by  the  product 
F(z)  G (z)  of  two  pgfs  will  be  the  sums  of  the  corresponding  cumulants  of  F(z) 
and  G(z),  because  logarithms  of  products  are  sums.  Therefore  all  cumulants 
of  the  sum  of  independent  random  variables  are  additive,  just  as  the  mean  and 
variance  are.  This  property  makes  cumulants  more  important  than  moments. 


"For  these  higher 
half-invariants  we 
shall  propose  no 
special  names.  ” 

-T . N.  Thiele  [288] 
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If  we  take  a slightly  different  tack,  writing 


. CXl  CX  27  0C3  , 

G(1  +t)  = 1 + Tyt+  2ft  + 3[t3  + 


equation  (8.33)  tells  us  that  the  ct’s  are  the  “factorial  moments” 

<Xm  - G(m' (1 ) 

= Y_  Pr(X  = k)k— zk_m  |z=1 

k^O 

= J2k21Pr(X  = k) 

k^O 

= E(X— ) . 


It  follows  that 


(8-49) 


G(e')  = 1 + ~(e'  - 1)  + ff  (e'  _ 1 )2  + • ■ 

= l + ?j(t+lt!  + ...)  + |(t2+t3  + . ..)  + ■■ 

= 1 + CXl  t + j (cX-2  + CXl  )t2  4-  • ■ • , 

and  we  can  express  the  cumulants  in  terms  of  the  derivatives  G*ml(1): 

(8-5°) 
(8-51) 
(8.52) 


Kl  = CXl  , 

k2  = Oil  + «i  - Ctj , 

<3  = + 3a2  + ai  - 3a2ai  3a)  + 2a3 , 


This  sequence  of  formulas  yields  “additive”  identities  that  extend  (8.38)  and 
(8.39)  to  all  the  cumulants. 

Let’s  get  back  down  to  earth  and  apply  these  ideas  to  simple  examples. 
The  simplest  case  of  a random  variable  is  a “random  constant,”  where  X has 
a certain  fixed  value  x with  probability  1.  In  this  case  Gx(z)  = zx,  and 
In  Gx(el)  = xt;  hence  the  mean  is  x and  all  other  cumulants  are  zero.  It 
follows  that  the  operation  of  multiplying  any  pgf  by  zx  increases  the  mean 
by  x but  leaves  the  variance  and  all  other  cumulants  unchanged. 

How  do  probability  generating  functions  apply  to  dice?  The  distribution 
of  spots  on  one  fair  die  has  the  pgf 

z + z2  + z3  + z4  + z5  + z6 


6 


zll6(z) , 
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where  Ug  is  the  pgf  for  the  uniform  distribution  of  order  6.  The  factor  ‘z’ 
adds  1 to  the  mean,  so  the  mean  is  3.5  instead  of  2^4  = 2.5  as  given  in  (8.35); 
but  an  extra  ‘z’  does  not  affect  the  variance  (8.36),  which  equals  y|. 

The  pgf  for  total  spots  on  two  independent  dice  is  the  square  of  the  pgf 
for  spots  on  one  die, 

r , . z2 + 2z3 + 3z4 +4z5 + 5z6  + 6z7 + 5z8 +4z9 + 3z10 + 2Z1  j Fz12 

GsU)  = 

= z2U6(z)2. 


If  we  roll  a pair  of  fair  dice  n times,  the  probability  that  we  get  a total  of 
k spots  overall  is,  similarly, 

[Zk]  GS(z)n  = [Zk]  Z2nUf;(z)2Tl 
= [zk“2n]  Ug  (z)2n 


In  the  hats-off-to-football-victory  problem  considered  earlier,  otherwise 
known  as  the  problem  of  enumerating  the  fixed  points  of  a random  permuta- 
tion, we  know  from  (5.49)  that  the  pgf  is 


Hat  distribution  is 
a different  kind  of 
uniform  distribu- 
tion, 


FtJz) 


= L 


(n  - k)j  zk 
(n-k)!  kT 


Therefore 


for  n ^ 0. 


(8-53) 


f;w 


y (n  - lc)j  zk~1 
",  fen(n-k)!(k-1)! 

y (n-  1 — k)j  zk 

2-  (n  — 1 — k)!  k! 

0<k<;n-l 

= Fn-l(z). 


Without  knowing  the  details  of  the  coefficients,  we  can  conclude  from  this 
recurrence  F^(z)  = Fn_i  (z)  that  Fnm'(z)  = F,-,(z);  hence 

FiT'O)  = Fn-m(l)  = [n^m],  (8.54) 

This  formula  makes  it  easy  to  calculate  the  mean  and  variance;  we  find  as 
before  (but  more  quickly)  that  they  are  both  equal  to  1 when  n 2. 

In  fact,  we  can  now  show  that  the  mth  cumulant  Km  of  this  random 
variable  is  equal  to  1 whenever  n m.  For  the  mth  cumulant  depends  only 
on  F^(1),  F"(l),  ....  Fnm'(l),  and  these  are  all  equal  to  1;  hence  we  obtain 
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the  same  answer  for  the  mth  cumulant  as  we  do  when  we  replace  F,(z)  by 
the  limiting  pgf 

F,(z)  = ez_1  , (8.55) 

which  has  F^'  ( 1)  ~ 1 for  derivatives  of  all  orders.  The  cumulants  of  F^  are 
identically  equal  to  1,  because 

In  Foote4)  = lneet_1  = e‘-1  = Ti+l!+l!+"' 
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Con  artists  know 

that  p fa  0.1 
when  you  spin  a 
newly  minted  U.S. 
penny  on  a smooth 
table.  (The  weight 
distribution  makes 
Lincoln’s  head  fall 
downward.) 


Now  let’s  turn  to  processes  that  have  just  two  outcomes.  If  we  flip 
a coin,  there’s  probability  p that  it  comes  up  heads  and  probability  q that  it 
comes  up  tails,  where 

p + q = 1 ■ 

(We  assume  that  the  coin  doesn’t  come  to  rest  on  its  edge,  or  fall  into  a hole, 
etc.)  Throughout  this  section,  the  numbers  p and  q will  always  sum  to  1.  If 
the  coin  is  fair,  we  have  p = q =j  \ otherwise  the  coin  is  said  to  be  biased. 

The  probability  generating  function  for  the  number  of  heads  after  one 
toss  of  a coin  is 


H(z)  = q+pz.  (8.56) 

If  we  toss  the  coin  n times,  always  assuming  that  different  coin  tosses  are 
independent,  the  number  of  heads  is  generated  by 

H(z)n  = (q  + pz)n  = H f^)pkqn~kzk,  (8.57) 

according  to  the  binomial  theorem.  Thus,  the  chance  that  we  obtain  exactly  k 
heads  in  n tosses  is  (£)  pV  k-  This  sequence  of  probabilities  is  called  the 

binomial  distribution. 

Suppose  we  toss  a coin  repeatedly  until  heads  first  turns  up.  What  is 
the  probability  that  exactly  k tosses  will  be  required?  We  have  k = 1 with 
probability  p (since  this  is  the  probability  of  heads  on  the  first  flip);  we  have 
k = 2 with  probability  qp  (since  this  is  the  probability  of  tails  first,  then 
heads);  and  for  general  k the  probability  is  qk_1p.  So  the  generating  function 
is 


pz  + qpz2  + q2pz3  H 


PZ 

1 — qz  ‘ 


(8.58) 
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Repeating  the  process  until  n heads  are  obtained  gives  the  pgf 


pz 

1 — qz 


pnzn^ 

k 


/n  + k- 

V k 


(qz)k 


L 


k - 1 


n 


puqk-nzk 


This,  incidentally,  is  zn  times 


1 — qz 


= L 


n + k - 1 
k 


pnqkzk , 


(8-59) 


(8.60) 


the  generating  function  for  the  negative  binomial  distribution. 

The  probability  space  in  example  (8.59),  where  we  flip  a coin  until 
n heads  have  appeared,  is  different  from  the  probability  spaces  we’ve  seen 
earlier  in  this  chapter,  because  it  contains  infinitely  many  elements.  Each  el- 
ement is  a finite  sequence  of  heads  and/or  tails,  containing  precisely  n heads 
in  all,  and  ending  with  heads;  the  probability  of  such  a sequence  is  pnqk  n, 
where  k n is  the  number  of  tails.  Thus,  for  example,  if  n = 3 and  if  we 
write  H for  heads  and  T for  tails,  the  sequence  THTTTHH  is  an  element  of  the 
probability  space,  and  its  probability  is  qpqqqpp  = p3q4. 

Let  X be  a random  variable  with  the  binomial  distribution  (8.57),  and  let 
Y be  a random  variable  with  the  negative  binomial  distribution  (8.60).  These 
distributions  depend  on  n and  p.  The  mean  of  X is  nH’(l)  = np,  since  its 
pgf  is  H(z)n;  the  variance  is 


Heads  / win, 
tails  you  lose. 

No?  OK;  tails  you 
lose,  heads  I win. 

No?  Well,  then, 
heads  you  lose , 
tails  1 win. 


u(H"(1)  +H'(1) -H'(1)2)  = n(0  + p — p2)  = npq.  (8.61) 


Thus  the  standard  deviation  is  \/npq : If  we  toss  a coin  n times,  we  expect 
to  get  heads  about  np  ± ^/npq  times.  The  mean  and  variance  of  Y can  be 
found  in  a similar  way:  If  we  let 


we  have 


G'(z) 

G"(z) 


pq 

(1  -q^)2  ’ 

foq2  . 

(1  qz)3  ’ 


hence  G’(1)  = pq/p2  = q/p  and  G”(1)  = 2pq2/p3  = 2q2/p2.  It  follows  that 
the  mean  of  Y is  nq/p  and  the  variance  is  nq/p2. 
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A simpler  way  to  derive  the  mean  and  variance  of  Y is  to  use  the  reciprocal 
generating  function 


¥(z) 


1 - qz  1 q 

_ z 

P P P 


(8.62) 


and  to  write 


G(z)”  = F(z)-n. 


(8.63) 


The  probability  is 
negative  that  I'm 
getting  younger. 

Oh?  Then  it’s  >1 
that  you're  getting 
older,  or  staying 
the  same. 


This  polynomial  F(z)  is  not  a probability  generating  function,  because  it  has 
a negative  coefficient.  But  it  does  satisfy  the  crucial  condition  F(1)  = 1. 
Thus  F(z)  is  formally  a binomial  that  corresponds  to  a coin  for  which  we 
get  heads  with  “probability”  equal  to  -q/p;  and  G(z)  is  formally  equivalent 
to  flipping  such  a coin  -1  times(!).  The  negative  binomial  distribution 
with  parameters  (n,p)  can  therefore  be  regarded  as  the  ordinary  binomial 
distribution  with  parameters  (n’,  p’)  = (-n,  -q/p).  Proceeding  formally, 
the  mean  must  be  n’p’  = (-n)(-q/p)  = nq/p,  and  the  variance  must  be 
n'p'q ' = (— ■ tl) ( — q/p) ( 1 + q/p)  = nq/12.  This  formal  derivation  involving 
negative  probabilities  is  valid,  because  our  derivation  for  ordinary  binomials 
was  based  on  identities  between  formal  power  series  in  which  the  assumption 
0 </  p <C  1 was  never  used. 

Let’s  move  on  to  another  example:  Flow  many  times  do  we  have  to  flip 
a coin  until  we  get  heads  twice  in  a row?  The  probability  space  now  consists 
of  all  sequences  of  H’s  and  T’s  that  end  with  HHbut  have  no  consecutive  H’s 
until  the  final  position: 


Q = [ HH , THH , TTHH , HTHH , TTTHH , THTHH , HTTHH , . . .}. 


The  probability  of  any  given  sequence  is  obtained  by  replacing  H by  p and  T 
by  q ; for  example,  the  sequence  THTHH  will  occur  with  probability 

Pr(THTHH)  = qpqpp  = p3q2. 


We  can  now  play  with  generating  functions  as  we  did  at  the  beginning 
of  Chapter  7,  letting  S be  the  infinite  sum 

s = HH  + THH  + TTHH  + HTHH  + TTTHH  + THTHH  + HTTHH  + . . . 


of  all  the  elements  of  Q.  If  we  replace  each  H by  pz  and  each  T by  qz,  we  get 
the  probability  generating  function  for  the  number  of  flips  needed  until  two 
consecutive  heads  turn  up. 
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There’s  a curious  relation  between  S and  the  sum  of  domino  tilings 
T = l + 0 + m + B + [ID  + H + BD+--- 

in  equation  (7.1).  Indeed,  we  obtain  S from  T if  we  replace  each  0 by  T and 
each  B by  HT,  then  tack  on  an  HH  at  the  end.  This  correspondence  is  easy  to 
prove  because  each  element  of  D has  the  form  (T  T HT)UHH  for  some  n ;>  0, 
and  each  term  of  T has  the  form  (□  + B)n.  Therefore  by  (7.4)  we  have 

S = (1  — T — HT ) ~ 1 HH  , 

and  the  probability  generating  function  for  our  problem  is 


G(z)  = (1  -qz-  (pz)(qz))  \pz)2 


= 1 qz  — pqz2  . 

Our  experience  with  the  negative  binomial  distribution  gives  us  a clue 
that  we  can  most  easily  calculate  the  mean  and  variance  of  (8.64)  by  writing 


(8.64) 


where 

FM  ■ 1 

and  by  calculating  the  “mean”  and  “variance”  of  this  pseudo-pgf  F(z).  (Once 
again  we’ve  introduced  a function  with  F(  1)  = 1.)  We  have 

F’(1)  = (— q — 2pq)/p2  = 2-p"1  -p~2; 

F”(1)  = — 2pq/p2  = 2 - 2p-]  ■ 

Therefore,  since  z2  = F(z)G(z),  Mean(z2)  = 2,  and  Var(z2)  = 0,  the  mean 
and  variance  of  distribution  G(z)  are 


Mean(G)  = 2 — Mean(F)  z:  p 2 + p 1 ; (8.65) 

Var(G)  = -Var(F)  =p~4®  t&-3  - 2p~2  - p”1  . (8.66) 


When  p = j the  mean  and  variance  are  6 and  22,  respectively.  (Exercise  4 
discusses  the  calculation  of  means  and  variances  by  subtraction.) 
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Now  let’s  try  a more  intricate  experiment:  We  will  flip  coins  until  the 
pattern  THTTH  is  first  obtained.  The  sum  of  winning  positions  is  now 

S = THTTH  + HTHTTH  + TTHTTH 

+ HHTHTTH  -f  HTTHTTH  + THTHTTH  + TTTHTTH  + . ; 


“ ‘You  really  are  an 
automaton-a  cal- 
culating machine,  ’ 

7 cried,  'There  is 
something  positively 
inhuman  in  you  at 
times.'” 

—J.  H,  Watson  [70] 


this  sum  is  more  difficult  to  describe  than  the  previous  one.  If  we  go  back  to 
the  method  by  which  we  solved  the  domino  problems  in  Chapter  7,  we  can 
obtain  a formula  for  S by  considering  it  as  a “finite  state  language”  defined 
by  the  following  “automaton”: 


V - 


The  elementary  events  in  the  probability  space  are  the  sequences  of  H’s  and 
T’s  that  lead  from  state  0 to  state  5.  Suppose,  for  example,  that  we  have 
just  seen  THT;  then  we  are  in  state  3.  Flipping  tails  now  takes  us  to  state  4; 
flipping  heads  in  state  3 would  take  us  to  state  2 (not  all  the  way  back  to 
state  0,  since  the  TH  we’ve  just  seen  may  be  followed  by  TTH). 

In  this  formulation,  we  can  let  S|<  be  the  sum  of  all  sequences  of  H’s  and 
T’s  that  lead  to  state  k:  it  follows  that 


50  — 1 -f  So  H T Si  H , 

51  = SoT  + S!T  + S4T, 

52  — Si  H+  S3  H , 

53  = s2t, 

54  = S3T, 

55  = S4  H 

Now  the  sum  S in  our  problem  is  S5;  we  can  obtain  it  by  solving  these  six 
equations  in  the  six  unknowns  So,  Si , . . . , S5.  Replacing  H by  pz  and  T by  qz 
gives  generating  functions  where  the  coefficient  of  zn  in  Sk  is  the  probability 
that  we  are  in  state  k after  rt  flips. 

In  the  same  way,  any  diagram  of  transitions  between  states,  where  the 
transition  from  state  j to  state  k occurs  with  given  probability  leads  to 
a set  of  simultaneous  linear  equations  whose  solutions  are  generating  func- 
tions for  the  state  probabilities  after  n transitions  have  occurred.  Systems 
of  this  kind  are  called  Markov  processes,  and  the  theory  of  their  behavior  is 
intimately  related  to  the  theory  of  linear  equations. 
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But  the  coin-flipping  problem  can  be  solved  in  a much  simpler  way, 
without  the  complexities  of  the  general  finite-state  approach.  Instead  of  six 
equations  in  six  unknowns  So,  Si  , . , . , S5,  we  can  characterize  S with  only 
two  equations  in  two  unknowns.  The  trick  is  to  consider  the  auxiliary  sum 
N = So  + Si  + S2  + S3  + S4  of  all  flip  sequences  that  don’t  contain  any  occur- 
rences of  the  given  pattern  THTTH: 

N=1  + H + T+HH+...+  THTHT  + THTTT  + . 

We  have 


1+N(H  + T)=  N + S,  (8.67) 

because  every  term  on  the  left  either  ends  with  THTTH  (and  belongs  to  S)  or 
doesn’t  (and  belongs  to  N);  conversely,  every  term  on  the  right  is  either  empty 
or  belongs  to  N H or  N T.  And  we  also  have  the  important  additional  equation 

Nthtth  = S + STTH,  (8.68) 


because  every  term  on  the  left  completes  a term  of  S after  either  the  first  H 
or  the  second  H,  and  because  every  term  on  the  right  belongs  to  the  left. 

The  solution  to  these  two  simultaneous  equations  is  easily  obtained:  We 
have  N = (1  — S)(  1 — H T)  1 from  (8.67),  hence 


(1  - S)(l  -T-H)  ‘THTTH  = S(1  +TTH) . 


As  before,  we  get  the  probability  generating  function  G(z)  for  the  number  of 
flips  if  we  replace  H by  pz  and  T by  qz . A bit  of  simplification  occurs  since 
p + q = 1 , and  we  find 


(1  — G (z))  p2q3z5 

r~z 


G(z)(1  +pq2z3) ; 


hence  the  solution  is 


G(z) 


p2q3z5 


p2q3z5  + ( 1 + pq2z3)(l  z) 


(8.69) 


Notice  that  G(  1)  =1,  if  pq  ^ 0;  we  do  eventually  encounter  the  pattern 

THTTH,  with  probability  1 , unless  the  coin  is  rigged  so  that  it  always  comes 
up  heads  or  always  tails. 

To  get  the  mean  and  variance  of  the  distribution  (8.69),  we  invert  G(z) 
as  we  did  in  the  previous  problem,  writing  G(z)=  z5/F(z)  where  F is  a poly- 
nomial: 

p2q3z5  + ( 1 + pq2z3)(1  - z) 
p2q3 


F(z)  = 


(8.70) 
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The  relevant  derivatives  are 

F(1)  =5  -(1  +Pq2)/p2q3, 

F”(1)  = 20  6pq2/p2q3  ; 

and  if  X is  the  number  of  flips  we  get 

EX  = Mean(G)  = 5-Mean(F)  = p 2q  3 + p 1 q^1  ; (8.71) 

VX  = Var(G)  = -Var(E) 

= -25  + p~2q  3 + 7p  'q'1  + Mean(F)2 
= (EX)2-9p-2q“J-3p-V\  (8.72) 

When  p = the  mean  and  variance  are  36  and  996. 

Let’s  get  general:  The  problem  we  have  just  solved  was  “random”  enough 
to  show  us  how  to  analyze  the  case  that  we  are  waiting  for  the  first  appearance 
of  an  arbitrary  pattern  A of  heads  and  tails.  Again  we  let  S be  the  sum  of 
all  winning  sequences  of  H' s and  T’s,  and  we  let  N be  the  sum  of  all  sequences 
that  haven’t  encountered  the  pattern  A yet.  Equation  (8.67)  will  remain  the 
same;  equation  (8.68)  will  become 


NA=  S(l+  A,,)[A""-,'=A[m_11]+  Ai2!  [A1"  » = A(ln.  2|| 

+ ---  + A1"  |l[A'"  = Aml),  (8.73) 

where  m is  the  length  of  A,  and  where  A^k  and  A(k)  denote  respectively  the 
last  k characters  and  the  first  k characters  of  A.  For  example,  if  A is  the 
pattern  THTTH  we  just  studied,  we  have 


A(,i  = H,  A(2) 

Am  = T,  4!, 


TH, 

A131 

= TTH, 

A(4i  = 

JLITTU 

TH, 

A(3; 

= THT, 

A, , , = 

THTT 

Since  the  only  perfect  match  is  A^2!  = A^',  equation  (8.73)  reduces  to  (8.68). 

Let  A be  the  result  of  substituting  p~ 1 for  H and  q_1  for  T in  the  pat- 
tern A.  Then  it  is  not  difficult  to  generalize  our  derivation  of  (8.71)  and  (8.72) 
to  conclude  (exercise  20)  that  the  general  mean  and  variance  are 


m 

EX=  Y_  A(k)[A(ki=A(k|];  (8.74) 

k=1 

m 

VX  = (EX)2  2j2k~1)A(kj[A,kS=A(kl]. 

k=l 


(8-75) 
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In  the  special  case  p = j we  can  interpret  these  formulas  in  a particularly 
simple  way.  Given  a pattern  A of  m heads  and  tails,  let 

m 

A:A  = ^2k-1[A'k)=A(k)].  (8.76) 

k=1 

We  can  easily  find  the  binary  representation  of  this  number  by  placing  a T’ 
under  each  position  such  that  the  string  matches  itself  perfectly  when  it  is 
superimposed  on  a copy  of  itself  that  has  been  shifted  to  start  in  this  position: 

A = HTHTHHTHTH 

A:A  = (1000010101)2  = 512+  16  + 4 + 1 =533 

HTHTHHTHTH  V 
HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTHTH 
HTHTHHTH'TH  + 

HTHTHHTHTH 
HTHTHHTHTH  s/ 

HTHTHHTHTH 
HTHTHHTHTH  V 

Equation  (8.74)  now  tells  us  that  the  expected  number  of  flips  until  pattern  A 
appears  is  exactly  2(A:A),  if  we  use  a fair  coin,  because  A(k)  = 2k  when 
p = q = j.  T his  result,  first  discovered  by  the  Soviet  mathematician  A.  D. 
Solov’ev  in  1966  [271],  seems  paradoxical  at  first  glance:  Patterns  with  no 
self-overlaps  occur  sooner  than  overlapping  patterns  do!  It  takes  almost  twice 
as  long  to  encounter  HHHHH  as  it  does  to  encounter  HHHHT  or  THHHH. 

Now  let’s  consider  an  amusing  game  that  was  invented  by  (of  all  people) 
Walter  Penney  [231]  in  1969.  Alice  and  Bill  flip  a coin  until  either  HHT  or 
HTT  occurs;  Alice  wins  if  the  pattern  HHT  comes  first,  Bill  wins  if  HTT  comes 
first.  This  game-now  called  “Penney  ante’’  -certainly  seems  to  be  fair,  if 
played  with  a fair  coin,  because  both  patterns  HHT  and  HTT  have  the  same 
characteristics  if  we  look  at  them  in  isolation:  The  probability  generating 
function  for  the  waiting  time  until  HHT  first  occurs  is 

Z3 

= z3  _8(z-  1)  ’ 

and  the  same  is  true  for  HTT.  Therefore  neither  Alice  nor  Bill  has  an  advan- 
tage, if  they  play  solitaire. 


“Chew  bol’she 
periodov  u nashego 
slova,  tem  pozzbe 
ono  poiavhaetsia.” 
-A.  D,  Solov'ev 


Of  course  not!  Who 
could  they  have  an 
advantage  over? 
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But  there’s  an  interesting  interplay  between  the  patterns  when  both  are 
considered  simultaneously.  Let  SA  be  the  sum  of  Alice’s  winning  configura- 
tions, and  let  Sg  be  the  sum  of  Bill’s: 

SA  - HHT  + HHHT  + THHT  + HHHHT  + HTHHT  + THHHT 

SB  = HTT  + THTT  + HTHTT  + TTHTT  + THTHTT  + TTTHTT  + . . . . 

Also-  taking  our  cue  from  the  trick  that  worked  when  only  one  pattern  was 
involved-let  us  denote  by  N the  sum  of  all  sequences  in  which  neither  player 
has  won  so  far: 

N = 1 + H + T + HH  1 HT  + TH  + TT  + HHH  + HTH  + THH  + ■ ■ ■ . (8.77) 

Then  we  can  easily  verify  the  following  set  of  equations: 

1 + IN  (H  + T)  = N+Sa  + Sb; 

NHHT  = SA  ; (8.78) 

N HTT  = SaT  + Sb. 

If  we  now  set  H = T = the  resulting  value  of  SA  becomes  the  probability 
that  Alice  wins,  and  Sg  becomes  the  probability  that  Bill  wins.  The  three 
equations  reduce  to 

1+N  = N+Sa+Sb;  gN  = SA;  |N  = lSA+SB; 

and  we  find  SA  = | , Sg  = } ■ Alice  will  win  about  twice  as  often  as  Bill! 

In  a generalization  of  this  game,  Alice  and  Bill  choose  patterns  A and  B 
of  heads  and  tails,  and  they  flip  coins  until  either  A or  B appears.  The 
two  patterns  need  not  have  the  same  length,  but  we  assume  that  A doesn’t 
occur  within  B,  nor  does  B occur  within  A.  (Otherwise  the  game  would  be 
degenerate.  For  example,  if  A = HT  and  B = THTH,  poor  Bill  could  never  win; 
and  if  A = HTH  and  B = TH,  both  players  might  claim  victory  simultaneously.) 
Then  we  can  write  three  equations  analogous  to  (8.73)  and  (8.78): 


+ N(H  + T)  = N + SA  +SB; 


i 


min(l,m) 


NA  =Sa£  A(l-ki  [A(k!  =A(k)]  + SB  X A‘l~ki  [Blkl  =A 


k=l 

min(l,m) 


k=l 
m 


(k)l  , 


NB  = SA  X B(m-k!  [A(ki=  B(k)]  + SBXB,m_k)[B(k)  = B(k)]  • 

k=l  k=1 

(8-79) 
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Here  l is  the  length  of  A and  m is  the  length  of  B.  For  example,  if  we  have 
A = HTTHTHTH  and  B = THTHTTH,  the  two  pattern-dependent  equations  are 

N HTTHTHTH  = SA  TTHTHTH  + SA  + SB  TTHTHTH  + SB  THTH  ; 

N THTHTTH  = SA  THTTH  + SA  TTH  + SB  THTTH  + SB  . 

We  obtain  the  victory  probabilities  by  setting  H = T = i,  if  we  assume  that  a 
fair  coin  is  being  used;  this  reduces  the  two  crucial  equations  to 

{ min(l,m) 

N = SAZ2k[A'ki=  A(k)]+SB  L 2k  ]B(k'  = A(k)] ; 

k=1  k=1  (8.80). 
min(l,m)  m 

N = SA  2k[A,k>=  B(k)]  + SB^2k[B<k>  = B(k)]  • 

k=l  k=1 

We  can  see  what’s  going  on  if  we  generalize  the  A: A operation  of  (8.76)  to  a 
function  of  two  independent  strings  A and  B: 

A:B  = Y_  2>C“1  [A""'  ■ (8'81) 

k=l 


Equations  (8.80)  now  become  simply 

Sa(A:A)  + Sb(B:A)  = Sa(A:B)+  Sb(B:B]  ; 

the  odds  in  Alice’s  favor  are 

B:B ~~  (8.82) 

SB  = A:A  - A:B 

(This  beautiful  formula  was  discovered  by  John  Horton  Conway  [111].) 

For  example,  if  A = HTTHTHTH  and  B = THTHTTH  as  above,  we  have 

A:A  = (10000001)2  = 129,  A:B  = (0001010)2  = 10,  B:A  = (0001001  )2  = 9, 
and  B:B  = (1000010)2  = 66;  so  the  ratio  SA/SB  is  (66— 9)/(129— 10)  = 57/1  19. 

Alice  will  win  this  one  only  57  times  out  of  every  176,  on  the  average. 

Strange  things  can  happen  in  Penney’ s game.  For  example,  the  pattern 
HHTH  wins  over  the  pattern  HTHH  with  3/2  odds,  and  HTHH  wins  over  THHH  with 
7/5  odds.  So  HHTH  ought  to  ‘be  much  better  than  THHH.  Yet  THHH  actually  wins 
over  HHTH,  with  7/5  odds!  The  relation  between  patterns  is  not  transitive.  In  Odd.  odd. 
fact,  exercise  57  proves  that  if  Alice  chooses  any  pattern  T;  T2  • • T)  of  length 
l ^ 3,  Bill  can  always  ensure  better  than  even  chances  of  winning  if  he  chooses 
the  pattern  t2TiT2  • • . 1 , where  t2  is  the  heads/tails  opposite  of  T2. 
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Somehow  the  verb 
“to  hash”  magically 
became  standard 
terminology  for  key 
transformation  dur- 
ing the  mid-l  960s, 
yet  nobody  was  rash 
enough  to  use  such 
an  undignified  word 
publicly  until  1967. 
— D.  E.  Knuth[i75] 


8.5  HASHING 

Let’s  conclude  this  chapter  by  applying  probability  theory  to  com- 
puter programming.  Several  important  algorithms  for  storing  and  retrieving 
information  inside  a computer  are  based  on  a technique  called  “hashing!’ 
The  general  problem  is  to  maintain  a set  of  records  that  each  contain  a “key” 
value,  K,  and  some  data  D(K)  about  that  key;  we  want  to  be  able  to  find 
D(K)  quickly  when  K is  given.  For  example,  each  key  might  be  the  name  of 
a student,  and  the  associated  data  might  be  that  student’s  homework  grades. 

In  practice,  computers  don’t  have  enough  capacity  to  set  aside  one  mem- 
ory cell  for  every  possible  key;  billions  of  keys  are  possible,  but  comparatively 
few  keys  are  actually  present  in  any  one  application.  One  solution  to  the 
problem  is  to  maintain  two  tables  KEY  [j]  and  DATA[j]  for  1 <C  j N,  where 
N is  the  total  number  of  records  that  can  be  accommodated;  another  vari- 
able n tells  how  many  records  are  actually  present.  Then  we  can  search  for  a 
given  key  K by  going  through  the  table  sequentially  in  an  obvious  way: 

51  Set  j :=  1,  (We’ve  searched  through  all  positions  < j.) 

52  If  j > n,  stop.  (The  search  was  unsuccessful.) 

53  If  KEY  Cjl  = K,  stop.  (The  search  was  successful.) 

54  Increase  j by  1 and  return  to  step  S2.  (We’ll  try  again.) 

After  a successful  search,  the  desired  data  entry  D(K)  appears  in  DATA[j]. 
After  an  unsuccessful  search,  we  can  insert  K and  D(K)  into  the  table  by 
setting 

n j,  KEY  [n]  :=  K,  DATA  [n]  :=  D(K), 

assuming  that  the  table  was  not  already  filled  to  capacity. 

This  method  works,  but  it  can  be  dreadfully  slow;  we  need  to  repeat 
step  S2  a total  of  n + 1 times  whenever  an  unsuccessful  search  is  made,  and 
n can  be  quite  large. 

Hashing  was  invented  to  speed  things  up.  The  basic  idea,  in  one  of  its 
popular  forms,  is  to  use  m separate  lists  instead  of  one  giant  list.  A “hash 
function”  transforms  every  possible  key  K into  a list  number  h(K)  between  1 
and  m.  An  auxiliary  table  FIRST  [i]  for  1 ^ i <;  m points  to  the  first  record 
in  list  i;  another  auxiliary  table  NEXT  [j]  for  1 ^ j ^ N points  to  the  record 
following  record  j in  its  list.  We  assume  that 

FIRST  [i]  = - 1 , if  list  \ is  empty; 

NEXT  [j]  = 0 , if  record  j is  the  last  in  its  list. 


As  before,  there’s  a variable  n that  tells  how  many  records  have  been  stored 
altogether. 
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For  example,  suppose  the  keys  are  names,  and  suppose  that  there  are 
m = 4 lists  based  on  the  first  letter  of  a name: 

{1 , for  ,4-F; 

2 , for  G-L; 

3 , for  M-R; 

4,  forS-Z. 

We  start  with  four  empty  lists  and  with  n = 0.  If,  say,  the  first  record  has 
Nora  as  its  key,  we  have  h(Nora)  = 3,  so  Nora  becomes  the  key  of  the  first 
item  in  list  3.  If  the  next  two  names  are  Glenn  and  Jim,  they  both  go  into 
list  2.  Now  the  tables  in  memory  look  like  this: 

FIRST  [1]  = -1,  FIRST  [2]  = 2,  FIRST  [3]  = I,  FIRST  [4]  = -1 
key[1]  = Nora,  NEXT[1]  = 0; 

KEY  [2]  = Glenn,  NEXT  [2]  = 3 ; 

KEY  [3]  = Jim,  NEXT  [3]  = 0 ; n = 3. 

(The  values  of  DATA  [1],  DATA  [2],  and  DATA  [3]  are  confidential  and  will  not 
be  shown.)  After  18  records  have  been  inserted,  the  lists  might  contain  the 
names 


list  1 

list  2 

list  3 

list  4 

Dianne 

Glenn 

Nora 

Scott 

Ari 

Jim 

Mike 

Tina 

Brian 

Jennifer 

Michael 

Fran 

Joan 

Ray 

Doug 

Jerry 

Jean 

Paula 

and  these  names  would  appear  intermixed  in  the  KEY  array  with  NEXT  entries 
to  keep  the  lists  effectively  separate.  If  we  now  want  to  search  for  John,  we 
have  to  scan  through  the  six  names  in  list  2 (which  happens  to  be  the  longest 
list);  but  that’s  not  nearly  as  bad  as  looking  at  all  18  names. 

Here’s  a precise  specification  of  the  algorithm  that  searches  for  key  K in 
accordance  with  this  scheme: 

HI  Set  i :=  h(K)  and  j :=  FIRST  [i] . 

H2  If  j <C  0,  stop.  (The  search  was  unsuccessful.) 

H3  If  KEY  Cjl  = K,  stop.  (The  search  was  successful.) 

H4  Set  i :=  j,  then  set  j :=  NEXT  [i]  and  return  to  step  H2.  (We’ll  try  again.) 
For  example,  to  search  for  Jennifer  in  the  example  given,  step  HI  would  set 
i ;=  2 and  j :=  2;  step  H3  would  find  that  Glenn  / Jennifer;  step  H4  would 
set  j :=  3;  and  step  H3  would  find  Jim  / Jennifer. 


Let’s  hear  it  for 
the  Concrete  Math 
students  who  sat  in 
the  front  rows  and 
lent  their  names  to 
this  experiment. 


/ bet  their  parents 
are  glad  about  that. 
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After  a successful  search,  the  desired  data  D(K)  appears  in  DATA  [j]  , as  in 

the  previous  algorithm.  After  an  unsuccessful  search,  we  can  enter  K and  D(K) 
in  the  table  by  doing  the  following  operations: 

n :=  n + 1; 

if  j < 0 then  FIRST  [i]  :=n  else  NEXT  [i]  :=  n; 

KEY  [n]  :=  K;  DATA  [n]  :=  D(K);  NEXT[n]  :=  0.  (8.83) 

Now  the  table  will  once  again  be  up  to  date. 

We  hope  to  get  lists  of  roughly  equal  length,  because  this  will  make  the 
task  of  searching  about  m times  faster.  The  value  of  m is  usually  much  greater 
than  4,  so  a factor  of  1/m  will  be  a significant  improvement. 

We  don’t  know  in  advance  what  keys  will  be  present,  but  it  is  generally 
possible  to  choose  the  hash  function  h so  that  we  can  consider  h(K)  to  be  a 
random  variable  that  is  uniformly  distributed  between  1 and  m,  independent 
of  the  hash  values  of  other  keys  that  are  present.  In  such  cases  computing  the 
hash  function  is  like  rolling  a die  that  has  m faces.  There’s  a chance  that  all 
the  records  will  fall  into  the  same  list,  just  as  there’s  a chance  that  a die  will 
always  turn  up  Q ; but  probability  theory  tells  us  that  the  lists  will  almost 
always  be  pretty  evenly  balanced. 

Analysis  of  Hashing:  Introduction. 

“Algorithmic  analysis”  is  a branch  of  computer  science  that  derives  quan- 
titative information  about  the  efficiency  of  computer  methods.  “Probabilistic 
analysis  of  an  algorithm”  is  the  study  of  an  algorithm’s  running  time,  con- 
sidered as  a random  variable  that  depends  on  assumed  characteristics  of  the 
input  data.  Hashing  is  an  especially  good  candidate  for  probabilistic  analysis, 
because  it  is  an  extremely  efficient  method  on  the  average,  even  though  its 
worst  case  is  too  horrible  to  contemplate.  (The  worst  case  occurs  when  all 
keys  have  the  same  hash  value.)  Indeed,  a computer  programmer  who  uses 
hashing  had  better  be  a believer  in  probability  theory. 

Let  P be  the  number  of  times  step  H3  is  performed  when  the  algorithm 
above  is  used  to  carry  out  a search.  (Each  execution  of  H3  is  called  a “probe” 
in  the  table.)  If  we  know  P,  we  know  how  often  each  step  is  performed, 
depending  on  whether  the  search  is  successful  or  unsuccessful: 


Step 

Unsuccessful  search 

Successful  search 

HI 

1 time 

1 time 

H2 

P + 1 times 

P times 

H3 

P times 

P times 

H4 

P times 

P — 1 times 

400  DISCRETE  PROBABILITY 


Thus  the  main  quantity  that  governs  the  running  time  of  the  search  procedure 
is  the  number  of  probes,  P. 

We  can  get  a good  mental  picture  of  the  algorithm  by  imagining  that  we 
are  keeping  an  address  book  that  is  organized  in  a special  way,  with  room  for 
only  one  entry  per  page.  On  the  cover  of  the  book  we  note  down  the  page 
number  for  the  first  entry  in  each  of  m lists;  each  name  K determines  the  list 
h(K)  that  it  belongs  to.  Every  page  inside  the  book  refers  to  the  successor 
page  in  its  list.  The  number  of  probes  needed  to  find  an  address  in  such  a 
book  is  the  number  of  pages  we  must  consult. 

If  n items  have  been  inserted,  their  positions  in  the  table  depend  only 
on  their  respective  hash  values,  (IT  , h-2,  ■ • • , hn).  Each  of  the  m.n  possible 
sequences  (h’  , h.2, . . . , hn)  is  considered  to  be  equally  likely,  and  P is  a random 
variable  depending  on  such  a sequence. 

Case  1:  The  key  is  not  present.  Check  under  the 

Let’s  consider  first  the  behavior  of  P in  an  unsuccessful  search,  assuming  doormat, 
that  n records  have  previously  been  inserted  into  the  hash  table.  In  this  case 
the  relevant  probability  space  consists  of  mn+1  elementary  events 

ci)  = (Hi , h-2, . . . , h.n;  Hn+i ) 

where  h;  is  the  hash  value  of  the  jth  key  inserted,  and  where  hn+i  is  the  hash 
value  of  the  key  for  which  the  search  is  unsuccessful.  We  assume  that  the 
hash  function  h has  been  chosen  properly  so  that  Pr(tu)  = 1 /mn+1  for  every 
such  CD. 

For  example,  if  m = n 2,  there  are  eight  equally  likely  possibilities: 
hi  h.2  h3:  P 

1 1 T:  2 

11  2 : o 

1 2 1:1 

1 2 2:  1 

2 11:1 

2 1 2:  1 

2 2 1:0 

2 2 2:  2 

If  h’  = h.2  = h.3  we  make  two  unsuccessful  probes  before  concluding  that  the 
new  key  K is  not  present;  if  h’  = h.2  / h3  we  make  none;  and  so  on.  This  list 
of  all  possibilities  shows  that  P has  a probability  distribution  given  by  the  pgf 
(|  + |z  + |z2)  - [\  + \z)2,  when  m = n = 2. 

An  unsuccessful  search  makes  one  probe  for  every  item  in  list  number 
hn+1)  so  we  have  the  general  formula 

= [hi  ~ hn+i  ] + [h.2—  hn+i]  + " ' + [h.n  =hn+l]  • 


p 


(8.84) 
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The  probability  that  h.j  = Hn+i  is  1 /m,  for  1 <C  j <C  n;  so  it  follows  that 

EP  = E[h,  =Kn+i]  -t-E[H2  = Hn+i]  + •••  + E[hn=hn+i]  _ IL  _ 

m 

Maybe  we  should  do  that  more  slowly:  Let  Xj  be  the  random  variable 
X,  = X,(to)  = [hj  =hn+i] . 

Then  P = X]  H + Xn,  and  EXj  = 1/m  for  all  j n;  hence 

E P = EXi  H + EXn  = n/m . 


Good:  As  we  had  hoped,  the  average  number  of  probes  is  1 /m  times  what  it 
was  without  hashing.  Furthermore  the  random  variables  Xj  are  independent, 
and  they  each  have  the  same  probability  generating  function 


therefore  the  pgf  for  the  total  number  of  probes  in  an  unsuccessful  search  is 
P(Z)  = Xl  (Z)  ■ . . Xn(z)  = (m^  + Z)n  • (8.85) 

This  is  a binomial  distribution,  with  p — 1/m  and  q = (m  — 1 ) / m;  in  other 
words,  the  number  of  probes  in  an  unsuccessful  search  behaves  just  like  the 
number  of  heads  when  we  toss  a biased  coin  whose  probability  of  heads  is 
1/m  on  each  toss.  Equation  (8.61)  tells  us  that  the  variance  of  P is  therefore 


n ( m - 1 ) 

npq  = 1 , 

mz 

When  m is  large,  the  variance  of  P is  approximately  n/m,  so  the  standard 
deviation  is  approximately  yj n/m. 

Case  2:  The  key  is  present. 

Now  let’s  look  at  successful  searches.  In  this  case  the  appropriate  proba- 
bility space  is  a bit  more  complicated,  depending  on  our  application:  We  will 
let  Cl  be  the  set  of  all  elementary  events 


CU  = (h  j,...,  hn;  k) , 


(8.86) 


where  hi  is  the  hash  value  for  the  jth  key  as  before,  and  where  k is  the  index 
of  the  key  being  sought  (the  key  whose  hash  value  is  hk).  Thus  we  have 
1 </  hj  m for  1 <C  j <C  n,  and  1 k n;  there  are  mn  ■ n elementary 
events  CD  in  all. 
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Let  Sj  be  the  probability  that  we  are  searching  for  the  jth  key  that  was 
inserted  into  the  table.  Then 

Pr(tu)  = Sk/mn  (8.87) 

if  o>  is  the  event  (8.86).  (Some  applications  search  most  often  for  the  items 
that  were  inserted  first,  or  for  the  items  that  were  inserted  last,  so  we  will  not 
assume  that  each  Sj  = 1/n.)  Notice  that  Y. wen  f>r(a’)  = Hk=i  sk  = ^ > hence 
(8.87)  defines  a legal  probability  distribution. 

The  number  of  probes  P in  a successful  search  is  p if  key  K was  the  pth 
key  to  be  inserted  into  its  list.  Therefore 

P = [Hi  = h-kl  + [h-2  = hkl  + . . . + [Hk  =hk] ; 

or,  if  we  let  Xj  be  the  random  variable  [Hj  = Hk  ] , we  have 

P = Xt  + X2  + • • • + Xk  . (8.88) 

Suppose,  for  example,  that  we  have  m = 10  and  n = 16,  and  that  the  hash 
values  have  the  following  “random”  pattern: 


(Hi , . . . , Hi6)  = 3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3; 
(Pi,.,  . , P,6)  =1112111122312133 


The  number  of  probes  P,  needed  to  find  the  jth  key  is  shown  below  hi. 

Equation  (8.88)  represents  P as  a sum  of  random  variables,  but  we  can’t 
simply  calculate  E P as  EX;  +•  • -+EXk  because  the  quantity  k itself  is  a random 
variable.  What  is  the  probability  generating  function  for  P?  To  answer  this 
question  we  should  digress  a moment  to  talk  about  conditional  probability. 

If  A and  B are  events  in  a probability  space,  we  say  that  the  conditional 
probability  of  A,  given  B,  is 


Pr(a>eA|tu<EB) 


Pr(iu  £ A D B) 
" Pr(tueB) 


(8.89) 


For  example,  if  X and  Y are  random  variables,  the  conditional  probability  of 
the  event  X = x,  given  that  Y = y,  is 


Pr(X  = x | Y =y) 


Pr(X=x  and  Y = y) 
Pr(Y=y) 


(8-9°) 


For  any  fixed  y in  the  range  of  Y,  the  sum  of  these  conditional  probabil- 
ities over  all  x in  the  range  of  X is  Pr(Y  = y)/Pr(Y  = y)  = 1;  therefore  (8.90) 
defines  a probability  distribution,  and  we  can  define  a new  random  variable 
•X/y’  such  that  Pr(X|y  =x)  = Pr(X  =x  Y =y). 


Where  have  I seen 
that  pattern  before? 


Equation  (8.43)  was 
also  a momentary 
digression, 


8.5  HASHING  403 


If  X and  Y are  independent,  the  random  variable  X|y  will  be  essentially 
the  same  as  X,  regardless  of  the  value  of  y,  because  Pr(X  = x Y = y ) is  equal 
to  Pr(X  =x)  by  (8.5);  that’s  what  independence  means.  But  if  X and  Y are 
dependent,  the  random  variables  X/y  and  X|y ' need  not  resemble  each  other 
in  any  way  when  y ^t)'. 

If  X takes  only  nonnegative  integer  values,  we  can  decompose  its  pgf  into 
a sum  of  conditional  pgf’s  with  respect  to  any  other  random  variable  Y: 

Gx(z)  = Y_  Pr(Y  = y)GXy(z).  (8.91) 

yeY(Q{ 

This  holds  because  the  coefficient  of  zx  on  the  left  side  is  Pr(X  =x),  for  all 
x 6 X(Q),  and  on  the  right  it  is 

Y_  Pr(Y  = y)Pr(X  = x|  Y = y)  = V"  Pr(X  = x and  Y = y) 

y€Y(0)  yeYim 

= Pr(X  = x). 

For  example,  if  X is  the  product  of  the  spots  on  two  fair  dice  and  if  Y is  the 
sum  of  the  spots,  the  pgf  for  X|6  is 

Gx|6U)  = §Z5  + 5Z8  + 5Z9 

because  the  conditional  probabilities  for  Y = 6 consist  of  five  equally  probable 
events  { l_)  m,  0 n,  LJ  m,  0 11,  0 

mh  Equation  (8.91)  in  this  case 

reduces  to 


Gx(z)  = jgGx  2(z)+  jgGx,3(z)  + ^Gx  4 (z)  + ^ GX|5 (z) 

^GXi6(z)  + ^GX|7(z)  + ^GX|g(z)  + 3^  Gx  1 9 { z) 

^Gx;io(z)  + J^GX|ll  (z)  + ygGX|12(z)  , 


Oh.  now  / un- 
derstand what 
mathematicians 
mean  when  they 
say  something  is 
‘“obvious,”  “clear,” 
or  “trivial.” 


a formula  that  is  obvious  once  you  understand  it.  (End  of  digression.) 

In  the  case  of  hashing,  (8.91)  tells  us  how  to  write  down  the  pgf  for  probes 
in  a successful  search,  if  we  let  X = P and  Y = K.  For  any  fixed  k between  1 
and  n,  the  random  variable  P k is  defined  as  a sum  of  independent  random 
variables  Xi  + ..  . . + Xx;  this  is  (8.88).  So  it  has  the  pgf 
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Therefore  the  pgf  for  P itself  is 


g,m  - L SkGp|  ic  (2.  i 

' m — I + z\k_1 


k=1 

n 


k=l 


= Zs(2_I±i) 


where 


S(z)  = Si  + S2Z  + S3Z"  + • • • + SnZ1 


n-  1 


(8-92) 

(8-93) 


is  the  pgf  for  the  search  probabilities  (divided  by  z for  convenience). 

Good.  We  have  a probability  generating  function  for  P;  we  can  now  find 
the  mean  and  variance  by  differentiation.  It’s  somewhat  easier  to  remove  the 
z factor  first,  as  we’ve  done  before,  thus  finding  the  mean  and  variance  of 
P — 1 instead: 


F(z) 

F'(z) 

F"(z) 

Therefore 


GpW/z  = S^)  ; 

. s,{^)  ; 


m 


m 

iw— ] . 

mz  \ TTL  / 


E P = 1 + Mean(F)  = 1 + F’(  1)  = 1 + m 1 Mean(S)  ; (8.94) 

VP  = Var(F)  = F"(l ) + F'(1 ) - F'(l)2 

= m-2S"(1)  +nT1S'(1]  -m-2S'(l)2 
= m~2  Var(S)  + (m_1  - mT2)  Mean(S).  (8.95) 

These  are  general  formulas  expressing  the  mean  and  variance  of  the  num- 
ber of  probes  P in  terms  of  the  mean  and  variance  of  the  assumed  search 
distribution  S. 

For  example,  suppose  we  have  S]<  = 1/n  for  1 $ k n.  This  means 
we  are  doing  a purely  “random”  successful  search,  with  all  keys  in  the  table 
equally  likely.  Then  S(z)  is  the  uniform  probability  distribution  U,(z)  in 
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(8.32),  and  we  have  Mean(S)  = (n-  1)/2,  Var(S)  = (u2  — 1)/12.  Hence 


n — 1 

EP  = — — + 1 ; 

2m 


n2-  1 ( m - 1 ) ( n - 1 ) 
12m2  2m2 


(n  — 1 )(6m  + n — 5) 

12m2 


(8.96) 

(8-97) 


Once  again  we  have  gained  the  desired  speedup  factor  of  1 /m.  If  m = n/lnn 
and  n — t oo,  the  average  number  of  probes  per  successful  search  in  this  case 
is  about  j Inn,  and  the  standard  deviation  is  asymptotically  (lnn)/i/l2. 

On  the  other  hand,  we  might  suppose  that  sk  = (kHn)  1 for  1 ^k<Cn; 
this  distribution  is  called  “Zipf  s law!’  Then  Mean(G)  = n/H„  and  Var(  G)  = 
-jn (n  + 1)/Hn  n2/H2.  The  average  number  of  probes  for  m = n/lnn  as 

n — > 00  is  approximately  2,  with  standard  deviation  asymptotic  to  v2  In  n/ \fl. 

In  both  cases  the  analysis  allows  the  cautious  souls  among  us,  who  fear 
the  worst  case,  to  rest  easily:  Chebyshev’s  inequality  tells  us  that  the  lists 
will  be  nice  and  short,  except  in  extremely  rare  cases. 


OK,  gang,  time 
to  put  on  your 
skim  suits  again. 
-Friendly  TA 


Case  2,  continued:  Variants  of  the  variance. 

We  have  just  computed  the  variance  of  the  number  of  probes  in  a success- 
ful search,  by  considering  P to  be  a random  variable  over  a probability  space 
with  mn-n  elements  (hp  • • • , h.n;k).  But  we  could  have  adopted  another  point 
of  view:  Each  pattern  ( h_i , . . . , h,)  of  hash  values  defines  a random  variable 
P | ( Ki , . . . , h,),  representing  the  probes  we  make  in  a successful  search  of  a 
particular  hash  table  on  n given  keys.  The  average  value  of  P | (h,  , . . . , h,), 


A(h.i , . . . , hn)  = £p.Pr(P|(Hll...,hn)=p),  (8.98) 

P=i 


can  be  said  to  represent  the  running  time  of  a successful  search.  This  quantity 
A (hi , . . . , h,)  is  a random  variable  that  depends  only  on  (hi , ■ ■ ■ , h,),  not  on 
the  final  component  k;  we  can  write  it  in  the  form 


n 


A(h, , . . . , hn)  = Sr  P(h, , . . . , hn;  k) , 
k=l 


since  P|  (hi , . . . , h,)  = p with  probability 

LL 1 Pr(P(hi , , . . , hn;  k)  =p)  = m nsk[P(h, hn;k)=P] 

n=iPr(h,,...,hn;k)  n=i^-sk 

n 

= ^sk[P(h  i,...,hn;k)=p]  . 

k=l 
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The  mean  value  of  A(h.i , . . . , ha),  obtained  by  summing  over  all  TTln  pos- 
sibilities ( h-i , ha)  and  dividing  by  m71,  will  be  the  same  as  the  mean  value 

we  obtained  before  in  (8.94)  But  the  variance  of  A(Ttj  , , h.n)  is  something 
different;  this  is  a variance  of  mn  averages,  not  a variance  of  mn  -n  probe 
counts.  For  example,  if  m =:  1 (so  that  there  is  only  one  list),  the  “average” 
value  A(tV| , • • • , ha)  = A(1 , . . . , 1)  is  actually  constant,  so  its  variance  VA  is 
zero;  but  the  number  of  probes  in  a successful  search  is  not  constant,  so  the 
variance  VP  is  nonzero. 

We  can  illustrate  this  difference  between  variances  by  carrying  out  the 
calculations  for  general  m and  n in  the  simplest  case,  when  S|<  = 1/n  for 
1 <C  k <C  n.  In  other  words,  we  will  assume  temporarily  that  there  is  a uniform 
distribution  of  search  keys.  Any  given  sequence  of  hash  values  (hi  , ■ , h^J 
defines  m lists  that  contain  respectively  (rti  ,rt2,  ■ ■ ■ , n.m)  entries  for  some 
numbers  rtj,  where 


But  the  VP  is 
nonzero  only  in  an 
election  year. 


n!  + n2  + h nm  - n . 

A successful  search  in  which  each  of  the  n keys  in  the  table  is  equally  likely 
will  have  an  average  running  time  of 


A(h] , . . . , hn) 


(1H hrti ) + ( 1 + • • • +H-2 ) + •••  + (!  + •■•+  nm ) 

n 

rii  (ni+1)  + n2(n2+l)  + , , + nm(nm+n 

2n 


rii  + n2  H h rt^  + rt 

2n 


probes.  Our  goal  is  to  calculate  the  variance  of  this  quantity  A(h/| , . . . , hn), 
over  the  probability  space  consisting  of  all  mn  sequences  ! h.  , htl  j. 

The  calculations  will  be  simpler,  it  turns  out,  if  we  compute  the  variance 
of  a slightly  different  quantity, 


B(H, h.n) 


We  have 


A(hir . . ,hn)  = 1 +B(hi,...,hn)/n, 


hence  the  mean  and  variance  of  A satisfy 


EA  = 1 + 


EB 

> 


VB 


n 


2 ' 


n 


V A 


(8-99) 
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The  probability  that  the  list  sizes  will  be  m , n.2,  • • • , nm  is  the  multinomial 
coefficient 


> ^-2  > • • • > TC-ti 


11! 


ni!n2!  • ..n,, 


divided  by  mn;  hence  the  pgf  for  B(  h]  , . . . , h,)  is 


Bn(z) 


L 


Ui  ,n; nm^0 

ni  +n2  1 Hnm=n 


TM ) Tl2 ) • • • ) Tlfn 


,(n2,)+(7)+-+(Y)m-n 


This  sum  looks  a bit  scary  to  inexperienced  eyes,  but  our  experiences  in 
Chapter  7 have  taught  us  to  recognize  it  as  an  m-fold  convolution.  Indeed,  if 
we  consider  the  exponential  super-generating  function 


G(w,z)  = ^ Bn(z) 


n^O 


n! 


we  can  readily  verify  that  G (w,  z)  is  simply  an  mth  power: 


k^O 


G(w,z)  = (£>«)£ 


k\  m 


As  a check,  we  can  try  setting  z = 1;  we  get  G(w,  1)  = (ew)m,  so  the  coefficient 
of  mn  wn/n!  is  Bn  ( 1 ) = 1 ■ 

If  we  knew  the  values  of  (1)  and  B"  (1 ),  we  would  be  able  to  calculate 
Var(Bn).  So  we  take  partial  derivatives  of  G (w,  z)  with  respect  to  z: 


;G(w,z)  = Y_Ki< 


nr  w 


n^O 


3z  z_  nv  " nt 


= m 


wlr-'  ^ 


k! 


I 

kirO 


k! 


32  r.  , v-  , muw11 
jjGlw.z)  = LBnW  — 


n^O 


= mm— I 


k^O 


k! 


,k\m-2 


mi  z 

vk^0 


.DHL 

k! 


(k  \ rn-1 


L 

k^O 


L 

k^O 

V 


k\2 


k! 


-iU®-2- 

k! 
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Complicated,  yes;  but  everything  simplifies  greatly  when  we  set  z = 1 , For 
example,  we  have 


£b;(d- 


p(  m 1 )w 


; 2 ( k — 2 ) ! 


(m  l)w  V-  W 1 

2k! 


mw2e|m  Fw 


-I 


(mw)n+2 

2mn! 


= I 


n(n-1  )mnwn 
2m  n! 


and  it  follows  that 


Bid)  = 


nU 

2 m 


The  expression  for  EA  in  (8.99)  now  gives  EA  = 1 + (n-  1)/2m,  in  agreement 
with  (8.96). 

The  formula  for  B”  (1)  involves  the  similar  sum 

LG)(G)-,)^=U-"k(k;1Hk~2)wt 

k>0  V 7 X V 7 7 k£0 

= - 1 y (k  + 4hvk+3  _ (lw4+w3)eW. 

4 (k  — 3)!  4 k!  U + )£  ’ 

k $3  ' k^O 


hence  we  find  that 

B"(1)— -- p-  = rn(m-l)elv|m_2l(^w2ew)2+mevv|m  1 ! (jw4+w3)ev 

Tl^O 


= mewm(|mw4+  w3) 


"■"i"  -G)( 


-1  m. 


(8.101) 


Now  we  can  put  all  the  pieces  together  and  evaluate  the  desired  variance  VA. 
Massive  cancellation  occurs,  and  the  result  is  surprisingly  simple: 


va  = — = b;'(i)  + b;(d^b;(1)2 

n2  n2 


n(n  — 1 ) /(n  4-  1 )(n  — 2)  m n(n— 1) 

m2n2  ' i,  4 +J  4 

(m-  l)(n—  1) 

Sttv^ti 


(8.102) 
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Where  have  / seen 
that  pattern  before? 

Where  have  / seen 
that  graffito  before? 

IttvPtt' 


When  such  “coincidences”  occur,  we  suspect  that  there’s  a mathematical 
reason;  there  might  be  another  way  to  attack  the  problem,  explaining  why 
the  answer  has  such  a simple  form.  And  indeed,  there  is  another  approach  (in 
exercise  60),  which  shows  that  the  variance  of  the  average  successful  search 
has  the  general  form 

_ i n 

VA  = ^-Xs^c-l)  (8.103) 

k=l 

when  Sfc  is  the  probability  that  the  kth-inserted  element  is  being  sought. 
Equation  (8.102)  is  the  special  case  = 1/n  for  1 <?  k <C  n. 

Besides  the  variance  of  the  average,  we  might  also  consider  the  average  of 
the  variance.  In  other  words,  each  sequence  (hi  , . . . , hn)  that  defines  a hash 
table  also  defines  a probability  distribution  for  successful  searching,  and  the 
variance  of  this  probability  distribution  tells  how  spread  out  the  number  of 
probes  will  be  in  different  successful  searches.  For  example,  let’s  go  back  to 
the  case  where  we  inserted  n = 16  things  into  m =10  lists: 

(h1(  . . . , H16)  = 3 1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 
(Pi,, ..-Pie)  =1112111122312133 

A successful  search  in  the  resulting  hash  table  has  the  pgf 
G(3, 1,4, 1, . . . ,3)  = Skz’’13'1-4'1 3*) 

k=1 

= SlZ  + S2Z  + S3Z  + S4Z^  + • • • + SlgZ3  . 

We  have  just  considered  the  average  number  of  probes  in  a successful  search 
of  this  table,  namely  A(3, 1,4, 1 ,...,  3)  = Mean(G(3, 1 ,4, 1 ,. . . ,3)).  We  can 
also  consider  the  variance, 

Si  • I3  + S2  • 1 2 + S3  • 1 “ + S4  • l}  + ■ • • + Si g -32 

— (si  * 1 + S2  • 1 + S3  • 1 +S4-2  + --  - + S16-3)2 

This  variance  is  a random  variable,  depending  on  (h( ....  , h,),  so  it  is  natural 
to  consider  its  average  value. 

In  other  words,  there  are  three  natural  kinds  of  variance  that  we  may 
wish  to  know,  in  order  to  understand  the  behavior  of  a successful  search:  The 
overall  variance  of  the  number  of  probes,  taken  over  all  (h.) , . , . , h.n)  and  k; 
the  variance  of  the  average  number  of  probes,  where  the  average  is  taken 
over  all  k and  the  variance  is  then  taken  over  all  (hi , . . . , h.u);  and  the  average 
of  the  variance  of  the  number  of  the  probes,  where  the  variance  is  taken  over 
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all  k and  the  average  is  then  taken  over  all  (hi,.  , h,).  In  symbols,  the 
overall  variance  is 


vp  = y y -^p(h1,...,hn;k)2 

L — L — rnn 

1 ^h|  Hu  k=l 

n 

Sk 


'1  i hn^m'kH1 

the  variance  of  the  average  is 


y y ^-p(H1,...,hn;k) 

i—  i—  mn 


VA 


1 f 


Y — I y skP(lti,...,hn;k) 

^ — mn  \ L — 


mIL  v 

1 ^h-i  ,...,hn  xk  — 1 


v m1 

V1  $h.|  k — 1 

and  the  average  of  the  variance  is 


y_  — Y_  SkP(H, Hn;k)  j ; 


1 / n 

AV  = y — V SkP(hi,...,Hn;k)2 


1 | ,Hn 


- ( y_  SkP(h-i , . . . , h.n;  k' 


k=l 

It  turns  out  that  these  three  quantities  are  interrelated  in  a simple  way: 

VP  - VA  + AV . (8.104) 

In  fact,  conditional  probability  distributions  always  satisfy  the  identity 

vx  = V(E(X|Y))  + E(V(X|Y))  (8.105) 

if  X and  Y are  random  variables  in  any  probability  space  and  if  X takes  real 
values.  (This  identity  is  proved  in  exercise  22.)  Equation  (8.104)  is  the 
special  case  where  X is  the  number  of  probes  in  a successful  search  and  Y is 
the  sequence  of  hash  values  (hi , . . . , h,). 

The  general  equation  (8.105)  needs  to  be  understood  carefully,  because 
the  notation  tends  to  conceal  the  different  random  variables  and  probability 
spaces  in  which  expectations  and  variances  are  being  calculated.  For  each  y 
in  the  range  of  Y,  we  have  defined  the  random  variable  X y in  (8.90),  and  this 
random  variable  has  an  expected  value  E(X|y)  depending  on  y.Now  E(X  Y) 
denotes  the  random  variable  whose  values  are  E(  X y 1 as  y ranges  over  all 
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[Now  is  a good 
time  to  do  warmup 
exercise  6.) 


P is  Still  the  num- 
ber of  probes. 


possible  values  of  Y,  and  V(E(X|Y))  is  the  variance  of  this  random  variable 
with  respect  to  the  probability  distribution  of  Y.  Similarly,  E(V(X|Y))  is  the 
average  of  the  random  variables  V(X|y)  as  y varies.  On  the  left  of  (8.105) 
is  VX,  the  unconditional  variance  of  X.  Since  variances  are  nonnegative,  we 
always  have 

VX  3 V(E(X|Y))  and  VX  £ E(V(X|Y))  . (8.106) 

Case  1,  again:  Unsuccessful  search  revisited. 

Let’s  bring  our  microscopic  examination  of  hashing  to  a close  by  doing  one 
more  calculation  typical  of  algorithmic  analysis.  This  time  we’ll  look  more 
closely  at  the  total  running  time  associated  with  an  unsuccessful  search, 
assuming  that  the  computer  will  insert  the  previously  unknown  key  into  its 
memory. 

The  insertion  process  in  (8.83)  has  two  cases,  depending  on  whether  j is 
negative  or  zero.  We  have  j < 0 if  and  only  if  P = 0,  since  a negative  value 
comes  from  the  FIRST  entry  of  an  empty  list.  Thus,  if  the  list  was  previously 
empty,  we  have  P = 0 and  we  must  set  FIRST  [h^+i]  :=  n + 1.  (The  new 
record  will  be  inserted  into  position  n + 1.)  Otherwise  we  have  P > 0 and  we 
must  set  a LINK  entry  to  n + 1 , These  two  cases  may  take  different  amounts 
of  time;  therefore  the  total  running  time  for  an  unsuccessful  search  has  the 
form 


T = a+pP  + 6[P  = 0],  (8.107) 

where  a,  (3,  and  6 are  constants  that  depend  on  the  computer  being  used  and 
on  the  way  in  which  hashing  is  encoded  in  that  machine’s  internal  language. 
It  would  be  nice  to  know  the  mean  and  variance  of  T , since  such  information 
is  more  relevant  in  practice  than  the  mean  and  variance  of  P , 

So  far  we  have  used  probability  generating  functions  only  in  connection 
with  random  variables  that  take  nonnegative  integer  values.  But  it  turns  out 
that  we  can  deal  in  essentially  the  same  way  with 

Gx(z)  = £_  Pr(cu)zX(u,) 
u>€n 

when  X is  any  real-valued  random  variable,  because  the  essential  characteris- 
tics of  X depend  only  on  the  behavior  of  Gx  near  z = 1 , where  powers  of  z are 
well  defined.  For  example,  the  running  time  (8.107)  of  an  unsuccessful  search 
is  a random  variable,  defined  on  the  probability  space  of  equally  likely  hash 
values  (h-i , . . , , h.n;h.n+i ) with  1 hj  jC  m;  we  can  consider  the  series 

1 m mm 

Gt(2)  - L_  y . . . y y ^cx-HSPtHi  ,...,h.n;hn  + 1 )+6[P(h.i,...,Hn;H^+,)=0] 

' mn+1  *— 

Hi  =1  Hn  = l hn  + i=l 
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to  be  a pgf  even  when  a,  (3,  and  6 are  not  integers.  (In  fact,  the  parameters 
a,  (3,6  are  physical  quantities  that  have  dimensions  of  time;  they  aren’t  even 
pure  numbers!  Yet  we  can  use  them  in  the  exponent  of  z.)  We  can  still 
calculate  the  mean  and  variance  of  T,  by  evaluating  Gj(  1)  and  Gy  ( 1)  and 
combining  these  values  in  the  usual  way. 

The  generating  function  for  P instead  of  T is 


P(z 


° a Pr(P  = p)zp 

p^O 


Therefore  we  have 


Gt(z)  - Pr(P  = p)z‘t+|5p+6[p=0, 

p^O 


- za((z6-1)PriP  = 0)  + ^Pr(P=p)z|3p) 

p^O 


The  determination  of  Mean(Gr)  and  Var(Gj)  is  now  routine: 


Mean(G-r)  = Gj(1)  = <x+p-  + sfHi — IV  ; 

m v m / 

G4'(1)  = a(a-1)+2a(3-  + (3(|3-1)-  + |32Tl(n71] 

m m mz 


(8.108) 


4-  2a5 


/m  - 1 \n  ,/m  - I 


m — 1 \n 


Var(GT)  = Gy  (1 ) + G|(1)-G|(l)2 

'm—  1 \n  n 


0n(m-  I 

m2 


+<(^r-  (^D-  <—) 


In  Chapter  9 we  will  learn  how  to  estimate  quantities  like  this  when 
m and  n are  large.  If,  for  example,  m = n and  n — } oo,  the  techniques 
of  Chapter  9 will  show  that  the  mean  and  variance  of  T are  respectively 
a+(3+6e_1 +0(u_1 ) and  |32— 2|36e~1  +62(e_1  — e_2)  + 0(n_1 ).  Ifm  = n/lnn 
and  n — > oo  the  corresponding  results  are 


Mean( Gj)  = (3 Inn  + a + 6/n  + 0( (log n)2/u2)  ; 

Var(Gi)  = (32lnrt  — ((|3  Inn)2  + 2|36  Inn  — 62)/rt  + O ((log  n)3/u2) 
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Why  only  ten 
numbers? 

The  other  students 
either  weren’t 
empiricists  or 
they  were  just  too 
flipped  out. 


Exercises 

Warmups 

1 What’s  the  probability  of  doubles  in  the  probability  distribution  Proi 
of  (8.3),  when  one  die  is  fair  and  the  other  is  loaded?  What’s  the  proba- 
bility that  S = 7 is  rolled? 

2 What’s  the  probability  that  the  top  and  bottom  cards  of  a randomly  shuf- 
fled deck  are  both  aces?  (All  52!  permutations  have  probability  1/52!.) 

3 Stanford’s  Concrete  Math  students  were  asked  in  1979  to  flip  coins  until 
they  got  heads  twice  in  succession,  and  to  report  the  number  of  flips 
required.  The  answers  were 

3,  2,  3,  5,10,2,  6,  6,  9,  2. 

Princeton’s  Concrete  Math  students  were  asked  in  1987  to  do  a similar 
thing,  with  the  following  results: 

10,  2,  10,  7,  5,  2,  10,  6,  10,  2. 

Estimate  the  mean  and  variance,  based  on  (a)  the  Stanford  sample; 
(b)  the  Princeton  sample. 

4 Let  H(z)  = F(z)/G(z),  where  F(l)  = G(l)  = 1.  Prove  that 

Mean(H)  = Mean(F)  -Mean(G), 

Var(H)  = Var(F)  - Var(G)  , 

in  analogy  with  (8.38)  and  (8.39),  if  the  indicated  derivatives  exist  at 
z = L 

5 Suppose  Alice  and  Bill  play  the  game  (8.78)  with  a biased  coin  that  comes 
up  heads  with  probability  p.  Is  there  a value  of  p for  which  the  game 
becomes  fair? 

6 What  does  the  conditional  variance  law  (8.105)  reduce  to,  when  X and  Y 
are  independent  random  variables? 

Basics 

7 Show  that  if  two  dice  are  loaded  with  the  same  probability  distribution, 
the  probability  of  doubles  is  always  at  least  1. 

8 Let  A and  B be  events  such  that  A U B = Cl,  Prove  that 

Pr(toeAnB)  = Pr(cu € A)  Pr(tu  e B)  — Pr(cu 0 A)  Pr(cu  £ B) . 

9 Prove  or  disprove:  If  X and  Y are  independent  random  variables,  then  so 
are  F(X)  and  G(Y),  when  F and  G are  any  functions. 
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10  What’s  the  maximum  number  of  elements  that  can  be  medians  of  a ran- 
dom variable  X,  according  to  definition  (8.7)? 

1 1 Construct  a random  variable  that  has  finite  mean  and  infinite  variance. 

12  a If  P(z)  is  the  pgf  for  the  random  variable  X,  prove  that 

Pr(X  ^ r)  ^ x_rP(x)  for  0 < x ^ 1; 

Pr(X  ^ r)  ^ x.  rP(x)  for  x t>  1 . 

(These  important  relations  are  called  the  tail  inequalities.) 
b In  the  special  case  P(z)  = (1  + z)n/2n,  use  the  first  tail  inequality  to 
prove  that  XkjCan  (k)  ^ 1/a“n(1  - <x)|1~“)n  whenO<a<l. 

13  If  Xi,.  X2n  are  independent  random  variables  with  the  same  distri- 

bution, and  if  a is  any  real  number  whatsoever,  prove  that 


Pr 


X]  + F X2n 

2n 


< 


Xi  + • • ■ + Xn 


a 


i 

2~ 


14  Let  F(z)  and  G(z)  be  probability  generating  functions,  and  let 


H(z)  = p F:z;  + q G(z) 


where  p + q = 1 , (This  is  called  a mixture  of  F and  G;  it  corresponds  to 
flipping  a coin  and  choosing  probability  distribution  F or  G depending  on 
whether  the  coin  comes  up  heads  or  tails.)  Find  the  mean  and  variance 
of  H in  terms  of  p,  q,  and  the  mean  and  variance  of  F and  G. 

15  If  F(z)  and  G(z)  are  probability  generating  functions,  we  can  define  an- 
other pgf  H(z)  by  “composition”: 


H(z)  = F(G(z)). 

Express  Mean(H)  and  Var(H)  in  terms  of  Mean(F),  Var(F),  Mean(G), 
and  Var(G).  (Equation  (8.92)  is  a special  case.) 

16  Find  a closed  form  for  the  super  generating  function  Lr  qsO  Fn(z)wn, 

when  F,(z)  is  the  football-fixation  generating  function  defined  in  (8.53). 

17  Let  Xn  p and  YRiP  have  the  binomial  and  negative  binomial  distributions, 
respectively,  with  parameters  (n,  p).  (These  distributions  are  defined  in 
(8.57)  and  (8.60).)  Prove  that  Pr(Yn  p ^ m)  = Pr(Xm+Uip  ^n).  What 
identity  in  binomial  coefficients  does  this  imply? 

18  A random  variable  X is  said  to  have  the  Poisson  distribution  with 
mean  u if  Pr(X=  k)  = e_tip.k/k!  for  all  k t>  0. 

a What  is  the  pgf  of  such  a random  variable? 
b What  are  its  mean,  variance,  and  other  cumulants? 


The  distribution  of 
fish  per  unit  volume 
of  water. 
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19  Continuing  the  previous  exercise,  let  X]  be  a random  Poisson  variable 
with  mean  |4i , and  let  X2  be  a random  Poisson  variable  with  mean  \i2, 
independent  of  X] . 

a What  is  the  probability  that  Xi  + X2  = n? 

b What  are  the  mean,  variance,  and  other  cumulants  of  2Xi  + 3X2? 

20  Prove  (8.74)  and  (8.75),  the  general  formulas  for  mean  and  variance  of 

the  time  needed  to  wait  for  a given  pattern  of  heads  and  tails. 

21  What  does  the  value  of  N represent,  if  H and  T are  both  set  equal  to  ~ 
in  (8.77)? 

22  Prove  (8.105),  ^aw  °f  conditional  expectations  and  variances. 

Homework  exercises 

23  Let  Proo  be  the  probability  distribution  of  two  fair  dice,  and  let  Prn  be 
the  probability  distribution  of  two  loaded  dice  as  given  in  (8.2).  Find  all 
events  A such  that  Proo(A)=  Pin  (A).  Which  of  these  events  depend 
only  on  the  random  variable  S?  (A  probability  space  with  Cl  = D2  has 
236  events;  only  2^  of  those  events  depend  on  S alone.) 

24  Player  J rolls  2n+  1 fair  dice  and  removes  those  that  come  up  □ . Player 

K then  calls  a number  between  1 and  6,  rolls  the  remaining  dice,  and 
removes  those  that  show  the  number  called.  This  process  is  repeated 
until  no  dice  remain.  The  player  who  has  removed  the  most  total  dice 
(n  + 1 or  more)  is  the  winner. 

a What  are  the  mean  and  variance  of  the  total  number  of  dice  that 
J removes?  Hint:  The  dice  are  independent, 
b What’s  the  probability  that  J wins,  when  n = 2? 

25  Consider  a gambling  game  in  which  you  stake  a given  amount  A and  you 

roll  a fair  die.  If  k spots  turn  up,  you  multiply  your  stake  by  2(k  — 1)/5. 
(In  particular,  you  double  the  stake  whenever  you  roll  □ , but  you  lose 

everything  if  you  roll  □ .)  You  can  stop  at  any  time  and  reclaim  the 

current  stake.  What  are  the  mean  and  variance  of  your  stake  after  n rolls? 
(Ignore  any  effects  of  rounding  to  integer  amounts  of  currency.) 

26  Find  the  mean  and  variance  of  the  number  of  L-cycles  in  a random  permu- 
tation of  n elements.  (The  football  victory  problem  discussed  in  (8.23), 
(8.24),  and  (8.53)  is  the  special  case  l = 1.) 

27  Let  Xi,  X2,  . . . , Xn  be  independent  samples  of  the  random  variable  X. 

Equations  (8.19)  and  (8.20)  explain  how  to  estimate  the  mean  and  vari- 
ance of  X on  the  basis  of  these  observations;  give  an  analogous  formula 
for  estimating  the  third  cumulant  K3.  (Your  formula  should  be  an  “un- 
biased” estimate,  in  the  sense  that  its  expected  value  should  be  K3.) 
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28  What  is  the  average  length  of  the  coin-flipping  game  (8.78) 

a given  that  Alice  wins? 

b given  that  Bill  wins? 

29  Alice,  Bill,  and  Computer  flip  a fair  coin  until  one  of  the  respective 
patterns  A = HHTH,  B :=  HTHH,  orC=  THHH  appears  for  the  first  time. 
(If  only  two  of  these  patterns  were  involved,  we  know  from  (8.82)  that  A 
would  probably  beat  B,  that  B would  probably  beat  C,  and  that  C would 
probably  beat  A;  but  all  three  patterns  are  simultaneously  in  the  game.) 
What  are  each  player’s  chances  of  winning? 

30  The  text  considers  three  kinds  of  variances  associated  with  successful 

search  in  a hash  table.  Actually  there  are  two  more:  We  can  consider  the 
average  (over  k)  of  the  variances  (over  , . . . , h.n)  of  P(  h/| , . . . , h.n;  k);  and 

we  can  consider  the  variance  (over  k)  of  the  averages  (over  hq , . . . , h.IX). 
Evaluate  these  quantities. 

31  An  apple  is  located  at  vertex  A of  pentagon  ABCDE,  and  a worm  is 

located  two  vertices  away,  at  C.  Every  day  the  worm  crawls  with  equal 
probability  to  one  of  the  two  adjacent  vertices.  Thus  after  one  day  the 
worm  is  at  vertex  B with  probability  j and  at  vertex  D with  probability  j. 
After  two  days,  the  worm  might  be  back  at  C again,  because  it  has  no 
memory  of  previous  positions.  When  it  reaches  vertex  A,  it  stops  to  dine, 
a What  are  the  mean  and  variance  of  the  number  of  days  until  dinner? 
b Let  p be  the  probability  that  the  number  of  days  is  100  or  more. 

What  does  Chebyshev’s  inequality  say  about  p? 
c What  do  the  tail  inequalities  (exercise  12)  tell  us  about  p? 

32  Alice  and  Bill  are  in  the  military,  stationed  in  one  of  the  five  states 

Kansas,  Nebraska,  Missouri,  Oklahoma,  or  Colorado.  Initially  Alice  is  in 
Nebraska  and  Bill  is  in  Oklahoma.  Every  month  each  person  is  reassigned 
to  an  adjacent  state,  each  adjacent  state  being  equally  likely.  (Here’s  a 
diagram  of  the  adjacencies: 


The  initial  states  are  circled.)  For  example,  Alice  is  restationed  after  the 
first  month  to  Colorado.,  Kansas,  or  Missouri,  each  with  probability  1/3. 
Find  the  mean  and  variance  of  the  number  of  months  it  takes  Alice  and 
Bill  to  find  each  other.  (You  may  wish  to  enlist  a computer’s  help.) 


Schmdinger’s  worm. 


Definitely  a finite- 
state  situation. 
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(Use  a calculator  for 
the  numerical  work 
on  this  problem.) 


33  Are  the  random  variables  Xi  and  X2  in  (8.88)  independent? 

34  Gina  is  a golfer  who  has  probability  p = .05  on  each  stroke  of  making  a 
“supershot”  that  gains  a stroke  over  par,  probability  q = .91  of  making 
an  ordinary  shot,  and  probability  r = .04  of  making  a “subshot”  that 
costs  her  a stroke  with  respect  to  par.  (Non-golfers:  At  each  turn  she 
advances  2,  1,  or  0 steps  toward  her  goal,  with  probability  p,  q,  or  r, 
respectively.  On  a par-rn  hole,  her  score  is  the  minimum  rt  such  that  she 
has  advanced  m or  more  steps  after  taking  n turns.  A low  score  is  better 
than  a high  score.) 

a Show  that  Gina  wins  a par-4  hole  more  often  than  she  loses,  when 
she  plays  against  a player  who  shoots  par.  (In  other  words,  the 
probability  that  her  score  is  less  than  4 is  greater  than  the  probability 
that  her  score  is  greater  than  4.) 

b Show  that  her  average  score  on  a par-4  hole  is  greater  than  4.  (There- 
fore she  tends  to  lose  against  a “steady”  player  on  total  points,  al- 
though she  would  tend  to  win  in  match  play  by  holes.) 


Exam  problems 

35  A die  has  been  loaded  with  the  probability  distribution 

Pr(0)  = Pi  ; PrO  = p2;  ••••  Pr(gj])  = p6  ■ 


Let  Sn  be  the  sum  of  the  spots  after  this  die  has  been  rolled  n times.  Find 
a necessary  and  sufficient  condition  on  the  “loading  distribution”  such 
that  the  two  random  variables  Sn  mod  2 and  Sn  mod  3 are  independent 
of  each  other,  for  all  n. 

36  The  six  faces  of  a certain  die  contain  the  spot  patterns 

□ □ © P 

instead  of  the  usual  Q through  Q 

a Show  that  there  is  a way  to  assign  spots  to  the  six  faces  of  another 
die  so  that,  when  these  two  dice  are  thrown,  the  sum  of  spots  has  the 
same  probability  distribution  as  the  sum  of  spots  on  two  ordinary 
dice.  (Assume  that  all  36  face  pairs  are  equally  likely.) 
b Generalizing,  find  all  ways  to  assign  spots  to  the  6n  faces  of  n dice  so 
that  the  distribution  of  spot  sums  will  be  the  same  as  the  distribution 
of  spot  sums  on  ri  ordinary  dice.  (Each  face  should  receive  a positive 
integer  number  of  spots.) 

37  Let  pn  be  the  probability  that  exactly  n tosses  of  a fair  coin  are  needed 
before  heads  are  seen  twice  in  a row,  and  let  qn  = X.k>n  P^-  Find  closed 
forms  for  both  pn  and  qn  in  terms  of  Fibonacci  numbers. 
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38  What  is  the  probability  generating  function  for  the  number  of  times  you 
need  to  roll  a fair  die  until  all  six  faces  have  turned  up?  Generalize  to 
m-sided  fair  dice:  Give  closed  forms  for  the  mean  and  variance  of  the 
number  of  rolls  needed  to  see  l of  the  m faces.  What  is  the  probability 
that  this  number  will  be  exactly  n? 

39  A Dirichlet  probability  generating  function  has  the  form 


p(z)  = y_ 

n^l 


Pn 

nz 


Thus  P(0)  = 1.  If  X is  a random  variable  with  Pr(X=n)  = pn,  express 
E(X),  V(X),  and  E(lnX)  in  terms  of  P(z)  and  its  derivatives. 

40  The  mth  cumulant  Km  of  the  binomial  distribution  (8.57)  has  the  form 
TVfm(p),  where  fm  is  a polynomial  of  degree  m.  (For  example,  fi  (p)  = p 
and  f 2 (p ) = p P",  because  the  mean  and  variance  are  np  and  npq.) 

a Find  a closed  form  for  the  coefficient  of  pk  in  f, ..(!»■ 
b Prove  that  fm(j!  -■  (2”  — l)Bm/m+  [m=l],  where  Bm  is  the  mth 
Bernoulli  number. 

4 1 Let  the  random  variable  Xn  be  the  number  of  flips  of  a fair  coin  until  heads 
have  turned  up  a total  of  n times.  Show  that  E(Xn|,)  = { — 1 ) n ( In 2 -+- 
H |_n/2j  — H,).  Use  the  methods  of  Chapter  9 to  estimate  this  value  with 
an  absolute  error  of  0 ( n~3  ). 

42  A certain  man  has  a problem  finding  work.  If  he  is  unemployed  on 
any  given  morning,  there’s  constant  probability  Ph  (independent  of  past 
history)  that  he  will  be  hired  before  that  evening;  but  if  he’s  got  a job 
when  the  day  begins,  there’s  constant  probability  Pf  that  he’ll  be  laid 
off  by  nightfall.  Find  the  average  number  of  evenings  on  which  he  will 
have  a job  lined  up,  assuming  that  he  is  initially  employed  and  that  this 
process  goes  on  for  n days.  (For  example,  if  n = 1 the  answer  is  1 — Pf.) 

43  Find  a closed  form  for  the  pgf  G,(z)  = Lk>0  Pk ,nz\  Where  pk  n is  the 
probability  that  a random  permutation  of  n objects  has  exactly  k cycles. 
What  are  the  mean  and  standard  deviation  of  the  number  of  cycles? 

44  The  athletic  department  runs  an  intramural  “knockout  tournament’’  for 
2n  tennis  players  as  follows.  In  the  first  round,  the  players  are  paired  off 
randomly,  with  each  pairing  equally  likely,  and  2n~  1 matches  are  played. 
The  winners  advance  to  the  second  round,  where  the  same  process  pro- 
duces 2n  ‘ winners.  And  so  on;  the  kth  round  has  2n~k  randomly  chosen 
matches  between  the  2T1~k+1  players  who  are  still  undefeated.  The  nth 
round  produces  the  champion.  Unbeknownst  to  the  tournament  organiz- 
ers, there  is  actually  an  (ordering  among  the  players,  so  that  x;  is  best,  x i 


D oes  TfeX  choose 
optimal  line  breaks? 
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A peculiar  set  of 
tennis  players. 


“A  fast  arithmetic 
computation  shows 
that  the  sherry  is 
always  at  least  three 
years  old.  Taking 
computation  further 
gives  the  vertigo.” 
-Revue  du  vin  de 
France  (Nov  1984) 


is  second  best,  ....  X2n  is  worst.  When  Xj  plays  Xk  and  j < k,  the  winner 
is  Xj  with  probability  p and  Xk  with  probability  1 — p,  independent  of 
the  other  matches.  We  assume  that  the  same  probability  p applies  to  all 
j and  k. 

a What’s  the  probability  that  xi  wins  the  tournament? 

b What’s  the  probability  that  the  nth  round  (the  final  match)  is  be- 

tween the  top  two  players,  xi  and  X2? 
c What’s  the  probability  that  the  best  2k  players  are  the  competitors 
in  the  kth-to-last  round?  (The  previous  questions  were  the  cases 
k=0  and  k=  1.) 

d Let  N(n)  be  the  number  of  essentially  different  tournament  results; 
two  tournaments  are  essentially  the  same  if  the  matches  take  place 
between  the  same  players  and  have  the  same  winners.  Prove  that 
N(n)  = 2n!. 

e What’s  the  probability  that  wins  the  tournament? 
f Prove  that  if  | < p < 1,  the  probability  that  Xj  wins  is  strictly 

greater  than  the  probability  that  Xj+i  wins,  for  1 <C  j < 2”. 

45  True  sherry  is  made  in  Spain  according  to  a multistage  system  called 

“Solera!’  For  simplicity  we’ll  assume  that  the  winemaker  has  only  three 
barrels,  called  A,  B,  and  C.  Every  year  a third  of  the  wine  from  barrel  C 
is  bottled  and  replaced  by  wine  from  B;  then  B is  topped  off  with  a third 
of  the  wine  from  A;  finally  A is  topped  off  with  new  wine.  Let  A(z),  B(z), 
C(z)  be  probability  generating  functions,  where  the  coefficient  of  zn  is 
the  fraction  of  n-year-old  wine  in  the  corresponding  barrel  just  after  the 
transfers  have  been  made. 

a Assume  that  the  operation  has  been  going  on  since  time  immemorial, 
so  that  we  have  a steady  state  in  which  A(z),  B(z),  and  C(z)  are  the 
same  at  the  beginning  of  each  year.  Find  closed  forms  for  these 
generating  functions. 

b Find  the  mean  and  standard  deviation  of  the  age  of  the  wine  in  each 
barrel,  under  the  same  assumptions.  What  is  the  average  age  of  the 
sherry  when  it  is  bottled?  How  much  of  it  is  exactly  25  years  old? 
c Now  take  the  finiteness  of  time  into  account:  Suppose  that  all  three 
barrels  contained  new  wine  at  the  beginning  of  year  0.  What  is  the 
average  age  of  the  sherry  that  is  bottled  at  the  beginning  of  year  n? 

46  Stefan  Banach  used  to  carry  two  boxes  of  matches,  each  containing 
n matches  initially.  Whenever  he  needed  a light  he  chose  a box  at  ran- 
dom, each  with  probability  2,  independent  of  his  previous  choices.  After 
taking  out  a match  he’d  put  the  box  back  in  its  pocket  (even  if  the  box 
became  empty-all  famous  mathematicians  used  to  do  this).  When  his 
chosen  box  was  empty  he’d  throw  it  away  and  reach  for  the  other  box. 
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a Once  he  found  that  the  other  box  was  empty  too.  What’s  the  prob- 
ability that  this  occurs?  (For  n = 1 it  happens  half  the  time  and 
for  n = 2 it  happens  3/8  of  the  time.)  To  answer  this  part,  find  a 
closed  form  for  the  generating  function  P(w,  z)  = pmnwTrlzn, 

where  pm  n is  the  probability  that,  starting  with  m matches  in  one 
box  and  n in  the  other,  both  boxes  are  empty  when  an  empty  box 
is  first  chosen.  Then,  find  a closed  form  for  Pn,n- 
b Generalizing  your  answer  to  part  (a),  find  a closed  form  for  the 
probability  that  exactly  k matches  are  in  the  other  box  when  an 
empty  one  is  first  thrown  away. 

c Find  a closed  form  for  the  average  number  of  matches  in  that  other 
box. 

4’7  Some  physicians,  collaborating  with  some  physicists,  recently  discovered 
a pair  of  microbes  that  reproduce  in  a peculiar  way.  The  male  microbe, 
called  a diphage,  has  two  receptors  on  its  surface;  the  female  microbe, 
called  a triphage,  has  three: 


diphage: 


t r i p h a g e : 


receptor:  □ 


When  a culture  of  diphages  and  triphages  is  irradiated  with  a psi-particle, 
exactly  one  of  the  receptors  on  one  of  the  phages  absorbs  the  particle; 
each  receptor  is  equally  likely.  If  it  was  a diphage  receptor,  that  diphage 
changes  to  a triphage;  if  it  was  a triphage  receptor,  that  triphage  splits 
into  two  diphages.  Thus  if  an  experiment  starts  with  one  diphage,  the 
first  psi-particle  changes  it  to  a triphage,  the  second  particle  splits  the 
triphage  into  two  diphages,  and  the  third  particle  changes  one  of  the 
diphages  to  a triphage.  The  fourth  particle  hits  either  the  diphage  or 
the  triphage;  then  there  are  either  two  triphages  (probability  §)  or  three 
diphages  (probability  |).  Find  a closed  form  for  the  average  number 
of  diphages  present,  if  we  begin  with  a single  diphage  and  irradiate  the 
culture  n times  with  single  psi-particles. 

48  Five  people  stand  at  the  vertices  of  a pentagon,  throwing  frisbees  to  each 
other. 


And  for  the  number 
in  the  empty  box. 


Or,  if  this  pentagon 
is  in  Arlington, 
throwing  missiles 
at  each  other. 
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Frisbee  is  a trade- 
mark of  Wham-0 
Manufacturing 
Company. 


They  have  two  frisbees,  initially  at  adjacent  vertices  as  shown.  In  each 
time  interval,  each  frisbee  is  thrown  either  to  the  left  or  to  the  right 
(along  an  edge  of  the  pentagon)  with  equal  probability.  This  process 
continues  until  one  person  is  the  target  of  two  frisbees  simultaneously; 
then  the  game  stops.  (All  throws  are  independent  of  past  history.) 
a Find  the  mean  and  variance  of  the  number  of  pairs  of  throws, 
b Find  a closed  form  for  the  probability  that  the  game  lasts  more  than 
100  steps,  in  terms  of  Fibonacci  numbers. 

49  Luke  Snowwalker  spends  winter  vacations  at  his  mountain  cabin.  The 
front  porch  has  m pairs  of  boots  and  the  back  porch  has  n pairs.  Every 
time  he  goes  for  a walk  he  flips  a (fair)  coin  to  decide  whether  to  leave 
from  the  front  porch  or  the  back,  and  he  puts  on  a pair  of  boots  at  that 
porch  and  heads  off.  There’s  a 50/50  chance  that  he  returns  to  each 
porch,  independent  of  his  starting  point,  and  he  leaves  the  boots  at  the 
porch  he  returns  to.  Thus  after  one  walk  there  will  be  m -|-  [-1  , 0,  or  -fl] 
pairs  on  the  front  porch  and  n [+1, 0,  or  —1]  pairs  on  the  back  porch. 
If  all  the  boots  pile  up  on  one  porch  and  if  he  decides  to  leave  from 
the  other,  he  goes  without  boots  and  gets  frostbite,  ending  his  vacation. 
Assuming  that  he  continues  his  walks  until  the  bitter  end,  let  Pj^  (m,  n)  be 
the  probability  that  he  completes  exactly  N nonfrostbitten  trips,  starting 
with  m pairs  on  the  front  porch  and  n on  the  back.  Thus,  if  both  m 
and  n are  positive, 


PN(m,n)  = |PN-i(m-  1,u  + l)  + iPN^jm.n) 

+ |PN_i(m+  l,n-  1) ; 

this  follows  because  this  first  trip  is  either  front/back,  front/front,  back / 
back,  or  back/front,  each  with  probability  1,  and  N — 1 trips  remain, 
a Complete  the  recurrence  for  P^  (m,  n)  by  finding  formulas  that  hold 
when  m = 0 or  n = 0.  Use  the  recurrence  to  obtain  equations  that 
hold  among  the  probability  generating  functions 

gm,n(z)  = Y_  PN(m-,n)zN  . 

N)>0 

b Differentiate  your  equations  and  set  z = 1,  thereby  obtaining  rela- 
tions among  the  quantities  g^n(  1).  Solve  these  equations,  thereby 
determining  the  mean  number  of  trips  before  frostbite. 
c Show  that  gm  n has  a closed  form  if  we  substitute  Z = 1 /cos2  0: 

/ \ \ sin(2m  + 1 )0  + sin(2n  + 1 )0 

9m, n I 77 rl  = : — ts j — s COS  0 

vcos20/  sin(2m  + 2n  + 2)0 
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50  Consider  the  function 


H(z)  = 1 +^2T(z-3  + Vd  -z)(9-z)). 

The  purpose  of  this  problem  is  to  prove  that  H(z)  = ^k>0  hkzk  is  a 
probability  generating  function,  and  to  obtain  some  basic  facts  about  it. 
a Let  (1  -z)3/2(9-z)1/2  =Xr,vi  ckzk-  Prove  that  Co  = 3,  q = -14/3, 

C 2 = 37/27,  and  C3+l  = 3 Xk  (k)  (}(t)  (f ) k+3  for  all  l :>  0.  Hint:  Use 
the  identity 

(9  — z)1/2  = 3(1  -z),/2(l  + fz/(l  -z)Y/2 


and  expand  the  last  factor  in  powers  of  z/(  1 — z). 
b Use  part  (a)  and  exercise  5.81  to  show  that  the  coefficients  of  H(z) 
are  all  positive. 

c Prove  the  amazing  identity 


9~H(z) 
1 - H(z) 


+ 2. 


d What  are  the  mean  and  variance  of  H? 


51  The  state  lottery  in  El  Dorado  uses  the  payoff  distribution  H defined 
in  the  previous  problem.  Each  lottery  ticket  costs  1 doubloon,  and  the 
payoff  is  k doubloons  with  probability  h-k-  Your  chance  of  winning  with 
each  ticket  is  completely  independent  of  your  chance  with  other  tickets; 
in  other  words,  winning  or  losing  with  one  ticket  does  not  affect  your 
probability  of  winning  with  any  other  ticket  you  might  have  purchased 
in  the  same  lottery. 

a Suppose  you  start  with  one  doubloon  and  play  this  game.  If  you  win 
k doubloons,  you  buy  k tickets  in  the  second  game;  then  you  take 
the  total  winnings  in  the  second  game  and  apply  all  of  them  to  the 
third;  and  so  on.  If  none  of  your  tickets  is  a winner,  you’re  broke 
and  you  have  to  stop  gambling.  Prove  that  the  pgf  of  your  current 
holdings  after  n rounds  of  such  play  is 


>/(9-z)/(1  -z)  + 2n  - 1 + \/(9  — z)/(1  - z)  + 2n  + 1‘ 

b Let  gn  be  the  probability  that  you  lose  all  your  money  for  the  first 
time  on  the  nth  game,  and  let  G(z)  = g i z + g 2 Z2  + • • ■ . Prove 
that  G(l)  = 1.  (This  means  that  you’re  bound  to  lose  sooner  or 
later,  with  probability  1 , although  you  might  have  fun  playing  in 
the  meantime.)  What  are  the  mean  and  the  variance  of  G? 
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A doubledoubloon. 


c What  is  the  average  total  number  of  tickets  you  buy,  if  you  continue 
to  play  until  going  broke? 

d What  is  the  average  number  of  games  until  you  lose  everything  if 
you  start  with  two  doubloons  instead  of  just  one? 

Bonus  problems 

5 2 Show  that  the  text’s  definitions  of  median  and  mode  for  random  variables 
correspond  in  some  meaningful  sense  to  the  definitions  of  median  and 
mode  for  sequences,  when  the  probability  space  is  finite. 

53  Prove  or  disprove:  If  X,  Y,  and  Z are  random  variables  with  the  property 
that  all  three  pairs  (X,  Y),  (X,  Z)  and  (Y,  Z)  are  independent,  then  X + Y 
is  independent  of  Z. 

54  Equation  (8.20)  proves  that  the  average  value  of  VX  is  VX.  What  is  the 
variance  of  VX? 

5 5 A normal  deck  of  playing  cards  contains  52  cards,  four  each  with  face 
values  in  the  set  {A,  2, 3, 4, 5, 6, 7, 8, 9, 10,  J,Q,K}.  Let  X and  Y denote 
the  respective  face  values  of  the  top  and  bottom  cards,  and  consider  the 
following  algorithm  for  shuffling: 

51  Permute  the  deck  randomly  so  that  each  arrangement  occurs  with 

probability  1/52!. 

52  If  X / Y,  flip  a biased  coin  that  comes  up  heads  with  probability  p, 
and  go  back  to  step  SI  if  heads  turns  up.  Otherwise  stop. 

Each  coin  flip  and  each  permutation  is  assumed  to  be  independent  of  all 
the  other  randomizations.  What  value  of  p will  make  X and  Y indepen- 
dent random  variables  after  this  procedure  stops? 

5 6 Generalize  the  frisbee  problem  of  exercise  48  from  a pentagon  to  an 
n-gon.  What  are  the  mean  and  variance  of  the  number  of  collision-free 
throws  in  general,  when  the  frisbees  are  initially  at  adjacent  vertices? 
Show  that,  if  m is  odd,  the  pgf  for  the  number  of  throws  can  be  written 
as  a product  of  coin- flipping  distributions: 


(m-1  )/2 

GmU)  = El  f 

k=l 


PkZ 


u . 2i2^-\)n 

where  = sur  — , 


(2k—  1 )7T 
2m 


„ . Ql  = cos 

2m  ’ Hlc 

Hint:  Try  the  substitution  z = 1/cos2  0. 

57  Prove  that  the  Penney-ante  pattern  T1T2  . . . Ti  1 Tj  is  always  inferior  to 
the  pattern  T2T1T2  • • • Ti-i  when  a fair  coin  is  flipped,  if  l .>  3. 
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58  Are  there  patterns  A and  B of  heads  and  tails  such  that  A is  longer 

than  B,  yet  A appears  before  B more  than  half  the  time  when  a fair  coin 
is  being  flipped? 

59  Let  k and  n be  fixed  positive  integers  with  k < n. 

a Find  a closed  form  for  the  probability  generating  function 


G(w,  z) 


1 

m" 


L -I 

h,  =1  hn  = l 


WP(H, H„;k)zP(H, 


.H„  ;n) 


for  the  joint  distribution  of  the  numbers  of  probes  needed  to  find  the 
kth  and  nth  items  that  have  been  inserted  into  a hash  table  with 
m lists. 

b Although  the  random  variables  P(h.i,.  . . , h.n;  k)  and  P(Ki , . . . , h.n;  n) 
are  dependent,  show  that  they  are  somewhat  independent: 


E(P(hi, . . . ,Hn;k)P(h.i,. . . ,h.n;n)) 

= (EP(h.1,...,hn;k])(EP(h1,...,Hn;n)). 


60  Use  the  result  of  the  previous  exercise  to  prove  (8.103). 

6 1 Continuing  exercise  47,  find  the  variance  of  the  number  of  diphages  after 
n irradiations. 


Research  problems 

62  The  normal  distribution  is  a non-discrete  probability  distribution  char- 
acterized by  having  all  its  cumulants  zero  except  the  mean  and  the  vari- 
ance. Is  there  an  easy  way  to  tell  if  a given  sequence  of  cumulants 
(ki,k2,k3,...)  comes  from  a discrete  distribution?  (All  the  probabil- 
ities must  be  “atomic”  in  a discrete  distribution.) 

63  Is  there  any  sequence  A = tjT2  • • • Tp  ]Ti  of  l 3 heads  and  tails  such 

that  the  sequences  Hti  • Ti_iand  TT1T2  • • ■ Xi_i  both  perform  equally 
well  against  A in  the  game  of  Penney  ante? 


Asymptotics 


EXACT  ANSWERS  are  great  when  we  can  find  them;  there’s  something 
very  satisfying  about  complete  knowledge.  But  there’s  also  a time  when 
approximations  are  in  order.  If  we  run  into  a sum  or  a recurrence  whose 
solution  doesn’t  have  a closed  form  (as  far  as  we  can  tell),  we  still  would  like 
to  know  something  about  the  answer;  we  don’t  have  to  insist  on  all  or  nothing. 
And  even  if  we  do  have  a closed  form,  our  knowledge  might  be  imperfect,  since 
we  might  not  know  how  to  compare  it  with  other  closed  forms. 

For  example,  there  is  (apparently)  no  closed  form  for  the  sum 


S 


n 


But  it  is  nice  to  know  that 


S 


n 


as  n — ) oo; 


IJh  oh  . . here  we  say  that  the  sum  is  “asymptotic  to’’  2(3^).  It’s  even  nicer  to  have  more 

comes  that  A -word,  detailed  information,  like 


which  gives  us  a “relative  error  of  order  1 /n2.”  But  even  this  isn’t  enough  to 
tell  us  how  big  Sn  is,  compared  with  other  quantities.  Which  is  larger,  Sn  or 
the  Fibonacci  number  F4n?  Answer;  We  have  S2  = 22  > Fg  = 21  when  n = 2; 
but  F4n  is  eventually  larger,  because  F4n  ~ c})4rl/’/5  and  cj>4  ~ 6.8541,  while 


Sn 


(9-2) 


Our  goal  in  this  chapter  is  to  learn  how  to  understand  and  to  derive  results 
like  this  without  great  pain. 
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The  word  asymptotic  stems  from  a Greek  root  meaning  “not  falling 
together!’  When  ancient  Greek  mathematicians  studied  conic  sections,  they 
considered  hyperbolas  like  the  graph  of  y = \J } + x2 , 


which  has  the  lines  y = x and  y = — x as  “asymptotes!’  The  curve  approaches 
but  never  quite  touches  these  asymptotes,  when  x — ) oo.  Nowadays  we  use 
“asymptotic”  in  a broader  sense  to  mean  any  approximate  value  that  gets 
closer  and  closer  to  the  truth,  when  some  parameter  approaches  a limiting 
value.  For  us,  asymptotics  means  “almost  falling  together!’ 

Some  asymptotic  formulas  are  very  difficult  to  derive,  well  beyond  the 
scope  of  this  book.  We  will  content  ourselves  with  an  introduction  to  the  sub- 
ject; we  hope  to  acquire  a suitable  foundation  on  which  further  techniques  can 
be  built.  We  will  be  particularly  interested  in  understanding  the  definitions 
of  and  ‘0’  and  similar  symbols,  and  we’ll  study  basic  ways  to  manipulate 
asymptotic  quantities. 


9.1  A HIERARCHY 

Functions  of  n that  occur  in  practice  usually  have  different  “asymp- 
totic growth  ratios”;  one  of  them  will  approach  infinity  faster  than  another. 
We  formalize  this  by  saying  that 


f(n)  X g(n) 


lim 

n — ►oo 


f(n) 

Tl  = 0 
g(n) 


(9-3) 


This  relation  is  transitive:  If  f(n)  -<  g(n)  and  g(n)  -<  h(n)  then  f(n)  -<  h(n). 
We  also  may  write  g(n)  >-  f(n)  if  f(n)  -<  g(n)  . This  notation  was  introduced 
in  1871  by  Paul  du  Bois-Reymond  [29]. 

For  example,  n -<  TV2;  informally  we  say  that  n grows  more  slowly 
than  n2.  In  fact, 


n“  -<  n13  a < (3  , 


(9-4) 


when  a and  (3  are  arbitrary  real  numbers. 

There  are,  of  course,  many  functions  of  n besides  powers  of  n.  We  can 
use  the  -<  relation  to  rank  lots  of  functions  into  an  asymptotic  pecking  order 


Other  words  like 
‘symptom’  and 
‘ptomaine’  also 
come  from  this  root. 


All  functions 
great  and  small. 
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that  includes  entries  like  this: 

1 -<  log  logn  -<  logn  -<  ne  -<  Tlc  -<  TLlogn  -<  Cn  -<  rtn  -<  cc 

(Here  e and  c are  arbitrary  constants  with  0 < £ < 1 < c.) 

All  functions  listed  here,  except  1 , go  to  infinity  as  n goes  to  infinity. 
Thus  when  we  try  to  place  a new  function  in  this  hierarchy,  we’re  not  trying 
to  determine  whether  it  becomes  infinite  but  rather  how  fast. 

It  helps  to  cultivate  an  expansive  attitude  when  we’re  doing  asymptotic 
analysis:  We  should  think  big,  when  imagining  a variable  that  approaches 
infinity.  For  example,  the  hierarchy  says  that  logn  -<  ^0.0001 . might 
seem  wrong  if  we  limit  our  horizons  to  teeny-tiny  numbers  like  one  googol, 
n = 10100,  For  in  that  case,  logn  = 100,  while  rt0'0001  is  only  100  01  ~ 1.0233. 

But  if  we  go  up  to  a googolplex,  n = 1 O10  , then  logn  = 10lo°  pales  in 

comparison  with  n0-0001  = 1010 

Even  if  £ is  extremely  small  (smaller  than,  say,  1/1010  ),  the  value 
of  logn  will  be  much  smaller  than  the  value  of  ne,  if  n is  large  enough.  For 
if  we  set  n = 1010  , where  k is  so  large  that  e ^ 10~k,  we  have  logn  = 102k 

but  n6  ^ 1 0’°  . The  ratio  (logu)/n£  therefore  approaches  zero  as  n -t  co. 

The  hierarchy  shown  above  deals  with  functions  that  go  to  infinity.  Often, 
however,  we’re  interested  in  functions  that  go  to  zero,  so  it’s  useful  to  have 
a similar  hierarchy  for  those  functions.  We  get  one  by  taking  reciprocals, 
because  when  f ( n ) and  g(n)  are  never  zero  we  have 


f(n)  -<  B(n)  4=^^ 


(9.5) 


Thus,  for  example,  the  following  functions  (except  1 ) all  go  to  zero: 


_L  J_  _L  ] J_ 

ccn  ^ nn  ^ cn  ^ nlogtl  ^ nc 


ne 


-<  log  log -ft- 


-<  1. 


Let’s  look  at  a few  other  functions  to  see  where  they  fit  in.  The  number 
7t(n)  of  primes  less  than  or  equal  to  n is  known  to  be  approximately  n/lnn. 
Since  1 /n£~<  1 /lnn-<  1,  multiplying  by  n tells  us  that 

n1_e  -<  7t(n)  -<  n 


We  can  in  fact  generalize  (9.4)  by  noticing,  for  example,  that 

n“'  (logn)a2(loglogn)“3  -<  rt131  (logn) (log logn) e'i 

<=>  (ai,a2,a3)  < (0, , |32,  |33)  • (9-6) 

Here  ‘(lx’,  a2,a3)  < ((3i,  P2,  |33)’  means  lexicographic  order  (dictionary  or- 
der); in  other  words,  either  a’  < (3 1 , or  a;  = (3 1 and  OC2  < P2,  or  (Xi  = |3i 
and  a2  = P2  and  CX3  < (33. 
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How  about  the  function  e'/logn ; where  does  it  live  in  the  hierarchy?  We 
can  answer  questions  like  this  by  using  the  rule 

ef(n)  -<  e9ln!  lim  (f(n)  — g(n))  = — oo , (9.7) 

n— *00 

which  follows  in  two  steps  from  definition  (9.3)  by  taking  logarithms.  Conse- 
quently 

1 -<  f (n)  -<  g(n)  =►  eif|n)  -<  el9(n)l  • 

And  since  1 -<  log  logn  -<  \/  log  n -<  e logn,  we  have  logn  -<  eV^°®  n X rte. 

When  two  functions  f(n)  and  g(n)  have  the  same  rate  of  growth,  we 
write  ‘f(n)  X g(n)‘.  The  official  definition  is: 

f(n)  x g(n)  f(n)|  $ C|g(rt)|  and  |g(n)|  ^ C|f(n)|, 

for  some  C and  for  all  sufficiently  large  n.  (9-8) 


This  holds,  for  example,  if  f(n)  is  constant  and  g(n)  = cos  n + arctan  n.  We 
will  prove  shortly  that  it  holds  whenever  f(n)  and  g(n)  are  polynomials  of 
the  same  degree.  There’s  also  a stronger  relation,  defined  by  the  rule 


f(n)  ~ g(n) 


r f(n) 

llm  — T 

n— »oo  g(n) 


1 . 


(9-9) 


In  this  case  we  say  that  “f(n)  is  asymptotic  to  g(n)!’ 

G.  H.  Hardy  [148]  introduced  an  interesting  and  important  concept  called 
the  class  of  logarithmico-exponential  functions,  defined  recursively  as  the 
smallest  family  £ of  functions  satisfying  the  following  properties: 

The  constant  function  f(n)  = a is  in  £ for  all  real  ot. 

The  identity  function  f(n)  = ri  is  in  £. 

If  f(n)  and  g(n)  are  in  £,  so  is  f(n)  — g(n). 

If  f(n)  is  in  £,  so  is  ef|n). 

If  f(n)  is  in  £ and  is  “eventually  positive,”  then  lnf(n)  is  in  £. 

A function  f(n)  is  called  “eventually  positive”  if  there  is  an  integer  Tlo  such 
that  f(n)  >0  whenever  n 5;  no. 

We  can  use  these  rules  to  show,  for  example,  that  f(n)  + g(n)  is  in  £ 
whenever  f(n)  and  g(n)  are,  because  f(n)  + g(n)  = f(n)  (O-g(n)).  If  f(n) 
and  g(n)  are  eventually  positive  members  of  £,  their  product  f(n)  g(n)  = 
g in f (n)+in g( n)  an(j  qUOtjent  f(n]/g(n)=  elnf[n'~  ln9(ni  are  in  £;  so  are  func- 
tions like  y/f(n)  = ellnf'n|,  etc.  Hardy  proved  that  every  logarithmico- 
exponential  function  is  eventually  positive,  eventually  negative,  or  identically 
zero.  Therefore  the  product  and  quotient  of  any  two  C-functions  is  in  £, 
except  that  we  cannot  divide  by  a function  that’s  identically  zero. 
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Hardy’s  main  theorem  about  logarithmico-exponential  functions  is  that 
they  form  an  asymptotic  hierarchy:  If  f(n)  and  g(n)  are  any  functions  in  £, 
then  either  f(n)  -<  g(n),  or  f(n)  y g(n),  or  f(n)  x g(n).  In  the  last  case 
there  is,  in  fact,  a constant  a such  that 

f(n)  ~ ag(n). 

The  proof  of  Hardy’s  theorem  is  beyond  the  scope  of  this  book;  but  it’s  nice 
to  know  that  the  theorem  exists,  because  almost  every  function  we  ever  need 
to  deal  with  is  in  £.  In  practice,  we  can  generally  fit  a given  function  into  a 
given  hierarchy  without  great  difficulty. 


", , . wir  durch  das 
Zeichen  0 (n)  eine 
GroBe  ausdrucken, 
deren  Ordnung  in 
Bezug  auf  n die 
Ordnung  von  n 
nicht  iiberschreitet; 
ob  s/e  wirklich 
Gliede  r von  der 
Ordnung  n in  sich 
enthalt,  bleibt  bei 
dem  bisherigen 
SchluBverfahren 
dahingestellt.” 

P.  Bachmarw  [14] 


9.2  0 NOTATION 

A wonderful  notational  convention  for  asymptotic  analysis  was  in- 
troduced by  Paul  Bachmann  in  1894  and  popularized  in  subsequent  years  by 
Edmund  Landau  and  others.  We  have  seen  it  in  formulas  like 


H1V  = Inn  + y + 0(l/n) , 


(9-i°) 


which  tells  us  that  the  nth  harmonic  number  is  equal  to  the  natural  logarithm 
of  n plus  Euler’s  constant,  plus  a quantity  that  is  “Big  Oh  of  1 over  n!’  This 
last  quantity  isn’t  specified  exactly;  but  whatever  it  is,  the  notation  claims 
that  its  absolute  value  is  no  more  than  a constant  times  1/n. 

The  beauty  of  O-notation  is  that  it  suppresses  unimportant  detail  and 
lets  us  concentrate  on  salient  features:  The  quantity  0(1  /n)  is  negligibly 
small,  if  constant  multiples  of  1/n  are  unimportant. 

Furthermore  we  get  to  use  0 right  in  the  middle  of  a formula.  If  we  want 
to  express  (9.10)  in  terms  of  the  notations  in  Section  9.1,  we  must  transpose 
‘Inn  + y’  to  the  left  side  and  specify  a weaker  result  like 


Hn  — Inn  — y -< 


log  log  11 

n 


or  a stronger  result  like 

Hn  — In  n — y x — . 

u 


The  Big  Oh  notation  allows  us  to  specify  an  appropriate  amount  of  detail 
in  place,  without  transposition. 

The  idea  of  imprecisely  specified  quantities  can  be  made  clearer  if  we 
consider  some  additional  examples.  We  occasionally  use  the  notation  1 ±1 ' to 
stand  for  something  that  is  either  +1  or  -1;  we  don’t  know  (or  perhaps  we 
don’t  care)  which  it  is,  yet  we  can  manipulate  it  in  formulas. 
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N.  G.  de  Bruijn  begins  his  book  Asymptotic  Methods  in  Analysis  by 
considering  a Big  Ell  notation  that  helps  us  understand  Big  Oh.  If  we  write 
L(5)  for  a number  whose  absolute  value  is  less  than  5 (but  we  don’t  say  what 
the  number  is),  then  we  cam  perform  certain  calculations  without  knowing 
the  full  truth.  For  example,  we  can  deduce  formulas  such  as  1 + L(5)  = L(6); 
L(2)  -)-  L(3)  = L(5);  L(2)L(3)  = L(6);  eL*5'  = L(e5);  and  so  on.  But  we  cannot 
conclude  that  L(5)  L(3)  = L(2),  since  the  left  side  might  be  4 0.  In  fact, 

the  most  we  can  say  is  L(5)  L(3)  = L(8). 

Bachmann’s  O-notation  is  similar  to  L-notation  but  it’s  even  less  precise: 
0 (a)  stands  for  a number  whose  absolute  value  is  at  most  a constant  times  | tx| . 
We  don’t  say  what  the  number  is  and  we  don’t  even  say  what  the  constant  is. 
Of  course  the  notion  of  a “constant”  is  nonsense  if  there  is  nothing  variable 
in  the  picture,  so  we  use  O-notation  only  in  contexts  when  there’s  at  least 
one  quantity  (say  n)  whose  value  is  varying.  The  formula 

f(n)  = 0(g(n))  for  all  n (g.ll) 

means  in  this  context  that  there  is  a constant  C such  that 

|f(n)|  $ c|g(n)|  for  all  n;  (9.12) 

and  when  0(g(n))  stands  in  the  middle  of  a formula  it  represents  a function 
f(n)  that  satisfies  (9.12).  The  values  of  f(n)  are  unknown,  but  we  do  know 
that  they  aren’t  too  large.  Similarly,  de  Bruijn’s  lL(n)’  represents  an  un- 
specified function  f(n)  whose  values  satisfy  If(n)  < |n|.  The  main  difference 
between  L and  0 is  that  O-notation  involves  an  unspecified  constant  C;  each 
appearance  of  0 might  involve  a different  C,  but  each  C is  independent  of  n. 
For  example,  we  know  that  the  sum  of  the  first  n squares  is 

□n  = jn(n+  j)(n  + 1)  = ^n3  + ^n2  + . 

We  can  write 

□n  = 0(n3) 

because  j ^n.3  + jn2  + |nl  l|n|3  + ^ |ru|2  + ||n|  6 j ]tl3 | + j|n3|  + ^|rt3  = |n3 

for  all  integers  n.  Similarly,  we  have  the  more  specific  formula 

□n  = jTi3  +0(n2); 

we  can  also  be  sloppy  and  throw  away  information,  saying  that 

□ n = 0(U10). 

Nothing  in  the  definition  of  0 requires  us  to  give  a best  possible  bound. 


It’s  not  nonsense, 
but  it  is  pointless. 


I’ve  got  a little 
list  -I’ve  got  a 
little  list, 

Of  annoying  terras 
and  details  that 
might  well  be  under 
ground, 

And  that  never 
would  be  missed  ~ 
that  never  would  be 
missed. 
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But  wait  a minute.  What  if  the  variable  n isn’t  an  integer?  What  if  we 
have  a formula  like  S(x)  = -jX3  + ^x2  + gX,  where  x is  a real  number?  Then  we 
cannot  say  that  S(x)  = 0(x3),  because  the  ratio  S(x)/x3  = - + 1 + ix  ‘ 

becomes  unbounded  when  x — > 0.  And  we  cannot  say  that  S(x)  = O(x), 
because  the  ratio  S(x)/x=  |x2  + jX  + i becomes  unbounded  when  x — ) oo. 
So  we  apparently  can’t  use  O-notation  with  S(x). 

The  answer  to  this  dilemma  is  that  variables  used  with  0 are  generally 
subject  to  side  conditions.  For  example,  if  we  stipulate  that  |x|  ^ 1,  or  that 
x ^ e where  g is  any  positive  constant,  or  that  x is  an  integer,  then  we  can 
write  S(x)  = 0(x3).  If  we  stipulate  that  |x|  <i  1,  or  that  |x|  c where  c is 

any  positive  constant,  then  we  can  write  S(x)  = O(x).  The  O-notation  is 
governed  by  its  environment,  by  constraints  on  the  variables  involved. 

These  constraints  are  often  specified  by  a limiting  relation.  For  example, 
we  might  say  that 

f(n)  = 0(g(n))  asri-foo.  (9.13) 


This  means  that  the  O-condition  is  supposed  to  hold  when  n is  “near”  00; 
we  don’t  care  what  happens  unless  n is  quite  large.  Moreover,  we  don’t 
even  specify  exactly  what  “near”  means;  in  such  cases  each  appearance  of  0 
implicitly  asserts  the  existence  of  two  constants  C and  no,  such  that 


|f(n)  6 C|g(n)  whenever  n ^ no. 


(9.14) 


You  are  the  fairest 
of  your  sex, 

Let  me  be  your 
hero; 

I love  you  as 
one  over  i, 

As  x approaches 
zero. 

Pos i t i ve I y . 


The  values  of  C and  no  might  be  different  for  each  0,  but  they  do  not  depend 
on  n.  Similarly,  the  notation 

f(x]  = 0(g(x))  as  x — > 0 

means  that  there  exist  two  constants  C and  e such  that 

|f(x)|  6 C|g(x)|  whenever  |x|  ^ e.  (9.15) 

The  limiting  value  does  not  have  to  be  00  or  0;  we  can  write 


lnz  = z-1  +0((z-1)2)  asz— >1, 

because  it  can  be  proved  that  | In  z — z + 1 J <C  jz  — 1 12  when  z — 1 <C  . 

Our  definition  of  0 has  gradually  developed,  over  a few  pages,  from  some- 
thing that  seemed  pretty  obvious  to  something  that  seems  rather  complex;  we 
now  have  0 representing  an  undefined  function  and  either  one  or  two  unspec- 
ified constants,  depending  on  the  environment.  This  may  seem  complicated 
enough  for  any  reasonable  notation,  but  it’s  still  not  the  whole  story!  Another 
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subtle  consideration  lurks  in  the  background.  Namely,  we  need  to  realize  that 
it’s  fine  to  write 

}n3  + V + in  = 0(n3) , 

but  we  should  never  write  this  equality  with  the  sides  reversed.  Otherwise 
we  could  deduce  ridiculous  things  like  n = rt2  from  the  identities  n = 0 (rt2) 
and  n2  = 0(rt2).  When  we  work  with  O-notation  and  any  other  formulas 
that  involve  imprecisely  specified  quantities,  we  are  dealing  with  one-way 
equalities.  The  right  side  of  an  equation  does  not  give  more  information  than 
the  left  side,  and  it  may  give  less;  the  right  is  a “crudification”  of  the  left. 

From  a strictly  formal  point  of  view,  the  notation  0(  g(n))  does  not 
stand  for  a single  function  -f(n),  but  for  the  set  of  all  functions  f(n)  such 
that  |f (n)  6 C|g(n)|  for  some  constant  C.  An  ordinary  formula  g(n)  that 
doesn’t  involve  O-notation  stands  for  the  set  containing  a single  function 
f(n)  = g(n).  If  S and  T are  sets  of  functions  of  n,  the  notation  S -f  T stands 
for  the  set  of  all  functions  of  the  form  f(n)  + g(n),  where  f(n)  £ S and 
g(n)  6 T;  other  notations  like  S-T,  ST,  S/T,  \/S,  es,  In  S are  defined  similarly. 
Then  an  “equation”  between  such  sets  of  functions  is,  strictly  speaking,  a set 
inclusion;  the  *=’  sign  really  means  ‘C’.  These  formal  definitions  put  all  of 
our  0 manipulations  on  firm  logical  ground. 

For  example,  the  “equation” 

in3  + 0(n2)  = 0(n3) 

means  that  S;  C S2,  where  S;  is  the  set  of  all  functions  of  the  form  jU3+fi  (n) 
such  that  there  exists  a constant  C;  with  | f 1 (rt) | <f  C-||n2|,  and  where  S2 
is  the  set  of  all  functions  f.!(n)  such  that  there  exists  a constant  C2  with 
| f 2 ( tl ) | 6 C2 |ru3 1-  We  can  formally  prove  this  “equation”  by  taking  an  arbi- 
trary element  of  the  left-hand  side  and  showing  that  it  belongs  to  the  right- 
hand  side:  Given  jn3  + iq  In)  such  that  |fi(n.)j  sC  C]|n2|,  we  must  prove 
that  there’s  a constant  C2  such  that  |jn3  + f]  (n)|  <C  C2 |ti3 | . The  constant 
C2  = { + C]  does  the  trick,  since  Tl2  ^ |n3|  for  all  integers  n. 

If  '='  really  means  ‘C’,  why  don’t  we  use  ‘C’  instead  of  abusing  the  equals 
sign?  There  are  four  reasons. 

First,  tradition.  Number  theorists  started  using  the  equals  sign  with  0- 
notation  and  the  practice  stuck.  It’s  sufficiently  well  established  by  now  that 
we  cannot  hope  to  get  the  mathematical  community  to  change. 

Second,  tradition.  Computer  people  are  quite  used  to  seeing  equals  signs 
abused-  for  years  FORTRAN  and  BASIC  programmers  have  been  writing 
assignment  statements  like  ‘N  = N + 1’.  One  more  abuse  isn’t  much. 

Third,  tradition.  We  often  read  ’ as  the  word  ‘is’.  For  instance  we 
verbalize  the  formula  Hn  = 0(log  n)  by  saying  “H  sub  n is  Big  Oh  of  log  n!’ 


"And  to  auoide  the 
tediouse  repetition 
of  these  woordes: 
is  equal/e  to:  I will 
sette as /doe often 
in  woorke  use,  a 
paireoi  parallels, 
or  Gemowe  lines  of 
one  lengthe,  thus: 
= , bicause 
noe  .2.  thynges,  can 
be  moare  equal/e." 

- R ■ Recorde  [246] 
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“It  is  obvious  that 
the  sign  = is  really 
the  wrong  sign 
for  such  relations, 
because  it  suggests 
symmetry,  and 
there  is  no  such 
symmetry.  . . . 

Once  this  warning 
has  been  given, 
there  is,  however, 
not  much  harm  i n 
using  the  sign  = , 
and  we  shall  main- 
tain it,  for  no  other 
reason  than  that  it 
is  customary.” 

-N.  G.  de  Bruijn  [62J 


[Now  is  a good 
time  to  do  warmup 
exercises  3 and  4.) 


And  in  English,  this  ‘is’  is  one-way.  We  say  that  a bird  is  an  animal,  but  we 
don’t  say  that  an  animal  is  a bird;  “animal”  is  a crudification  of  “bird!’ 

Fourth,  for  our  purposes  it’s  natural.  If  we  limited  our  use  of  O-notation 
to  situations  where  it  occupies  the  whole  right  side  of  a formula-as  in  the 
harmonic  number  approximation  Hn  = 0(log  n),  or  as  in  the  description  of 
a sorting  algorithm’s  running  time  T(n)  = O(nlogn)  -it  wouldn’t  matter 
whether  we  used  '=’  or  something  else.  But  when  we  use  O-notation  in  the 
middle  of  an  expression,  as  we  usually  do  in  asymptotic  calculations,  our 
intuition  is  well  satisfied  if  we  think  of  the  equals  sign  as  an  equality,  and  if 
we  think  of  something  like  0(1  /u)  as  a very  small  quantity. 

So  we’ll  continue  to  use  and  we’ll  continue  to  regard  0(g(n))  as  an 
incompletely  specified  function,  knowing  that  we  can  always  fall  back  on  the 
set-theoretic  definition  if  we  must. 

But  we  ought  to  mention  one  more  technicality  while  we’re  picking  nits 
about  definitions:  If  there  are  several  variables  in  the  environment,  O-notation 
formally  represents  sets  of  functions  of  two  or  more  variables,  not  just  one. 
The  domain  of  each  function  is  every  variable  that  is  currently  “free”  to  vary. 

This  concept  can  be  a bit  subtle,  because  a variable  might  be  defined  only 
in  parts  of  an  expression,  when  it’s  controlled  by  a jT  or  something  similar. 
For  example,  let’s  look  closely  at  the  equation 

n 

Y (k2  + O(k))  = jn3  + 0(n2) , integer  n ^ 0.  (9.16) 

k=0 

The  expression  k2  + O(k)  on  the  left  stands  for  the  set  of  all  two-variable 
functions  of  the  form  k2  + f(k,n)  such  that  there  exists  a constant  C with 
|f(k,  n)|  Ck  for  0 <C  k n.  The  sum  of  this  set  of  functions,  for  0 ^ k ^ n, 
is  the  set  of  all  functions  g(n)  of  the  form 

Tl 

^2(k2  + f(k,n))  = jn3  + \n2  + gn  + f(0,n)  + f(1,n)  H ff(n,n), 

k=0 

where  f has  the  stated  property.  Since  we  have 

|jti2  + gn  + f(0,n)  + f(l,n)  4 f f(n,n)| 

<C  in2  + ;ln2  + C-0  + C-l  +---  + C-n 
< n2  4-  Cln2  + n)/2  < (C  4-  1 )n2 , 

ah  such  functions  g(n)  belong  to  the  right-hand  side  of  (9.16);  therefore  (9.16) 
is  true. 

People  sometimes  abuse  O-notation  by  assuming  that  it  gives  an  exact 
order  of  growth;  they  use  it  as  if  it  specifies  a lower  bound  as  well  as  an 
upper  bound.  For  example,  an  algorithm  to  sort  n numbers  might  be  called 


434  ASYMPTOTICS 


inefficient  “because  its  running  time  is  0(n2).”  But  a running  time  of  0 (n2) 
does  not  imply  that  the  running  time  is  not  also  O(n).  There’s  another 
notation.  Big  Omega,  for  lower  bounds: 


f(n)  = 0(g(n))  4=7  |f (n)  3 C|g(n)|  for  some  C >0.  (9.17) 


We  have  f(n)  = D(g(ri))  if  and  only  if  g(n)  = 0(f(n)).  A sorting  algorithm 
whose  running  time  is  D(  rr  ) is  inefficient  compared  with  one  whose  running 
time  is  0 (n  log  n)  , if  ri  is  large  enough. 

Finally  there’s  Big  Theta,  which  specifies  an  exact  order  of  growth: 

fW.e,g(n0^andf(fi)  = o<^). 

We  have  f(n)  = @(g(n))  if  and  only  if  f(n)  x g(n)  in  the  notation  we  saw 
previously,  equation  (9.8). 

Edmund  Landau  [194]  invented  a “little  oh”  notation, 


Since  O and  0 are 
uppercase  G reek 
letters,  the  0 in 
0 -notation  must 
be  a capital  Greek 
0 micron. 

After  all,  Greeks  in- 
vented asymptotics. 


f(n)  = 0 (g  (n)) 

4=>  If (n) I ^ elg(n) 


for  all  n rto(e)  and 
for  all  constants  e > 0. 


(9-19) 


This  is  essentially  the  relation  f ( n . ) — < g(n)  of  (9.3).  We  also  have 
f(n)  ~ g(n)  4=7  f(n)  = g(n)  +o(g(n)). 


(9-20) 


Many  authors  use  ‘o’  in  asymptotic  formulas,  but  a more  explicit  ‘0’  ex- 
pression is  almost  always  preferable.  For  example,  the  average  running  time 
of  a computer  method  called,  “bubblesort”  depends  on  the  asymptotic  value 
of  the  sum  P(n)  = Y.k=o  kn  k k!/n!.  Elementary  asymptotic  methods  suffice 
to  prove  that  P(n)  ~ sJnn/2,  which  means  that  the  ratio  ?(n)/\Jim/2  ap- 
proaches 1 as  n — ) co.  However,  the  true  behavior  of  P(n)  is  best  understood 
by  considering  the  difference,  P(n)  — \Jm\/2,  not  the  ratio: 


n 

P(n)/v/7m/2 

P(n)  7m/2 

1 

0,798 

■0,253 

10 

0,  878 

■ 0. 484 

20 

0,  904 

■0,538 

30 

0,918 

■0,561 

40 

0, 927 

■0,  575 

50 

0,  934 

■ 0,  585 

The  numerical  evidence  in  the  middle  column  is  not  very  compelling;  it  cer- 
tainly is  far  from  a dramatic  proof  that  V{ri)/^nn/2  approaches  1 rapidly, 
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if  at  all.  But  the  right-hand  column  shows  that  P(n)  is  very  close  indeed  to 
Thus  we  can  characterize  the  behavior  of  P(n)  much  better  if  we  can 
derive  formulas  of  the  form 


P(n)  = 7^72  + 0(1), 
or  even  sharper  estimates  like 

P(n)  = § + 0(Wn) 

Stronger  methods  of  asymptotic  analysis  are  needed  to  prove  O-results,  but 
the  additional  effort  required  to  learn  these  stronger  methods  is  amply  com- 
pensated by  the  improved  understanding  that  comes  with  O-bounds. 
Moreover,  many  sorting  algorithms  have  running  times  of  the  form 

T(n)  = Anlgn  + Bn  + O(logn) 


Also  ID,  the  Dura- 
flamelogarithm. 


Notice  that 
log  log  log  n 
is  undefined  when 
n = 2. 


for  some  constants  A and  B.  Analyses  that  stop  at  T(n)  ~ Anlgn  don’t  tell 
the  whole  story,  and  it  turns  out  to  be  a bad  strategy  to  choose  a sorting  algo- 
rithm based  just  on  its  A value.  Algorithms  with  a good  ‘A’  often  achieve  this 
at  the  expense  of  a bad  ‘B’.  Since  nlgn  grows  only  slightly  faster  than  n,  the 
algorithm  that’s  faster  asymptotically  (the  one  with  a slightly  smaller  A value) 
might  be  faster  only  for  values  of  n that  never  actually  arise  in  practice.  Thus, 
asymptotic  methods  that  allow  us  to  go  past  the  first  term  and  evaluate  B 
are  necessary  if  we  are  to  make  the  right  choice  of  method. 

Before  we  go  on  to  study  0,  let’s  talk  about  one  more  small  aspect  of 
mathematical  style.  Three  different  notations  for  logarithms  have  been  used 
in  this  chapter:  lg,  In,  and  log.  We  often  use  ‘lg’  in  connection  with  computer 
methods,  because  binary  logarithms  are  often  relevant  in  such  cases;  and 
we  often  use  ‘In’  in  purely  mathematical  calculations,  since  the  formulas  for 
natural  logarithms  are  nice  and  simple.  But  what  about  ‘log’?  Isn’t  this 
the  “common”  base-10  logarithm  that  students  learn  in  high  school-the 
“common”  logarithm  that  turns  out  to  be  very  uncommon  in  mathematics 
and  computer  science?  Yes;  and  many  mathematicians  confuse  the  issue 
by  using  ‘log’  to  stand  for  natural  logarithms  or  binary  logarithms.  There 
is  no  universal  agreement  here.  But  we  can  usually  breathe  a sigh  of  relief 
when  a logarithm  appears  inside  O-notation,  because  0 ignores  multiplicative 
constants.  There  is  no  difference  between  O(lgn),  O(lnn),  and  O(logn),  as 
n — > oo;  similarly,  there  is  no  difference  between  0 (lg  lg  n) , 0 (In  In  n),  and 
OOoglog  n).  We  get  to  choose  whichever  we  please;  and  the  one  with  ‘log’ 
seems  friendlier  because  it  is  more  pronounceable.  Therefore  we  generally 
use  ‘log’  in  all  contexts  where  it  improves  readability  without  introducing 
ambiguity. 
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9.3  0 M AN  I P U L A T I 0 N 

Like  any  mathematical  formalism,  the  O-notation  has  rules  of  ma- 
nipulation that  free  us  from  the  grungy  details  of  its  definition.  Once  we 

prove  that  the  rules  are  correct,  using  the  definition,  we  can  henceforth  work 

on  a higher  plane  and  forget  about  actually  verifying  that  one  set  of  functions 
is  contained  in  another.  We  don’t  even  need  to  calculate  the  constants  C that 
are  implied  by  each  0,  as  long  as  we  follow  rules  that  guarantee  the  existence 
of  such  constants. 

For  example,  we  can  prove  once  and  for  all  that 

nm  = 0(nm  ),  when  m ^ m’;  (9.21) 

0(f(n))  + 0(g(n))  = 0(|f(n)|  + |g(n)|)  . (9.22) 

Then  we  can  say  immediately  that  |n3-|-2'n.2  + ^n  = Ofu^l+CHn'^-l-Ofn3)  = 
0(n3),  without  the  laborious  calculations  in  the  previous  section. 

Here  are  some  more  mles  that  follow  easily  from  the  definition: 


f(n)  = 0(f(n))  ; 

(923) 

C ■ 0(f(n))  = 0(f(n))  , if  c is  constant; 

(9-24) 

0(0(f(u)))  = O(f'ni)  ; 

(9-25) 

0(f(n))0(g(n))  = 0(f(rt)g(u))  ; 

(9-26) 

0(f(n)  g(n))  = f(n)0(g(n))  . 

(9.27) 

Exercise  9 proves  (9.22),  and  the  proofs  of  the  others  are  similar.  We  can 
always  replace  something  of  the  form  on  the  left  by  what’s  on  the  right, 
regardless  of  the  side  conditions  on  the  variable  n. 

Equations  (9.27)  and  (9.23)  allow  us  to  derive  the  identity  0(f(rt)2)  = 

0 (f(n))  2 . This  sometimes  helps  avoid  parentheses,  since  we  can  write 

O(logrt)2  instead  of  0((logn)2)  . 

Both  of  these  are  preferable  to  ‘0(log^  n)‘,  which  is  ambiguous  because  some 
authors  use  it  to  mean  ‘ 0 ( log  log  TX ) ’ . 

Can  we  also  write 

0 (log  n)  ,r  1 instead  of  0((logn)~1)? 

No!  This  is  an  abuse  of  notation,  since  the  set  of  functions  l/0(logn)  is 
neither  a subset  nor  a superset  of  0 (1  /log  n).  We  could  legitimately  substitute 
n(logn)"1  for  0 ((logrt)”1),  but  this  would  be  awkward.  So  we’ll  restrict  our 
use  of  “exponents  outside  the  0’’  to  constant,  positive  integer  exponents. 


The  secret  of  being 
a bore  is  to  tell 
everything. 

— Voltaire 


(Note:  The  formula 
O ( f ( n ) ) 2 does  not 
denote  the  set  of 
all  functions  g(u)2 
where  g(n)  is  in 
0(f(n));  such 
functions  g(n)2 
cannot  be  nega- 
tive, but  the  set 
0(f(n))2  includes 
negative  functions. 
In  genera/,  when 
S is  a set,  the  no- 
tation S2  stands 
for  the  set  of  all 
products  S1S2  with 
S]  and  S2  in  S, 
not  for  the  set  of 
all  squares  s2  with 
seS.J 
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Power  series  give  us  some  of  the  most  useful  operations  of  all.  If  the  sum 
S(z)  = ^ an  zn 

n^O 

converges  absolutely  for  some  complex  number  z = Zp , then 
S(z)  = 0(1) , for  all  |z|  $ |Zo|. 

This  is  obvious,  because 

|S(z)|  ^ ^|Qn||z|n  ^ ^|Qn||z0|n  = C < 00. 
n.^0  n^O 

In  particular,  S(z)  = 0(1)  as  z — > 0,  and  S(l/n)  = 0(1)  as  n -too,  provided 
only  that  S(z)  converges  for  at  least  one  nonzero  value  of  z.  We  can  use  this 
principle  to  truncate  a power  series  at  any  convenient  point  and  estimate  the 
remainder  with  0.  For  example,  not  only  is  S(z)  = 0(  1 ),  but 

S(z)  = aO  + 0(z) , 

S(z)  = QC  + Q,Z  + 0(z2)  , 

and  so  on,  because 

S(z)  = Y_  “kZk  +zm  £_  anzn-m 
0$k<m  n^m 

and  the  latter  sum  is  0 (1).  Table  438  lists  some  of  the  most  useful  asymp- 
totic formulas,  half  of  which  are  simply  based  on  truncation  of  power  series 
according  to  this  rule. 

Dirichlet  series,  which  are  sums  of  the  form  ^k>1  Qk/hz,  can  be  truncated 
in  a similar  way:  If  a Dirichlet  series  converges  absolutely  when  z = Zo,  we 
can  truncate  it  at  any  term  and  get  the  approximation 

Y_  dk/kz  + 0(mTz)  , 
l<T<m 

Remember  that  valid  for  (Hz  ^ 9lzo.  The  asymptotic  formula  for  Bernoulli  numbers  Bn  in 

91  stands  for  real  Table  438  illustrates  this  principle. 

^ ' On  the  other  hand,  the  asymptotic  formulas  for  Hn,  n!,  and  7i(n)  in 

Table  438  are  not  truncations  of  convergent  series;  if  we  extended  them  in- 
definitely they  would  diverge  for  all  values  of  n.  This  is  particularly  easy  to 
see  in  the  case  of  71(11),  since  we  have  already  observed  in  Section  7.3,  Ex- 
ample 5,  that  the  power  series  )T  k>0  lc! / (In  n)  ^ is  everywhere  divergent.  Yet 
these  truncations  of  divergent  series  turn  out  to  be  useful  approximations. 
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Table  438  Asymptotic  approximations,  valid  as  n — > oo  and  z —>  0. 


1 


7^~  Hn  = In  n + y + 

n!  = V2nn  j ( 1 + 


1 


+ 


In  12  n2  120n 

1 1 


1 

“6 

139 


12n  288n2  51840n3 


°(*) 


n! 


^ Bn  = 2fn  even](— 1 )n/2  1 ^ (1  + 2~n  + 3^  + 0(4^))  . 

3!  n 


7t  n 


n 


n 


(27t)r 

2!  n 


+ 


Inn  (Inn)2  (Inn)3  (Inn)4 


+ 0 


n 


(logn) 


Z2  Z3  Z4  c 

e-  l+z+-  + - + - + °(z5) 


ln(l  4-  z) 


1 - z 


z2  z3  z4 

2-2+T-4+0!z)- 


= 1 + z + z2  + z3  + z4  + 0(z5 


(1  +Z)“  = 1 + az  + [ „ |z"  + | , \ZJ  + 


a 


OL 


a 


z4  + 0(z5 


(9-28) 
• (9-29) 
(9-3o) 
(9-3i) 
(9.32) 
(9-33) 
(9-34) 
(9-35) 


An  asymptotic  approximation  is  said  to  have  absolute  error  0(  g(n))  if 
it  has  the  form  f(n)+0(g(n))  where  f(n)  doesn’t  involve  0.  The  approxima- 
tion has  relative  error  0(g(n))  if  it  has  the  form  f(n)(l  + 0(g(n)))  where 
f(n)  doesn’t  involve  0.  For  example,  the  approximation  for  Hn  in  Table  438 
has  absolute  error  0(n~  6);  the  approximation  for  n!  has  relative  error  0(tl-4). 
(The  right-hand  side  of  (9.29)  doesn’t  actually  have  the  required  form  f(n)  x 
(1  + 0(n  4)),  but  we  could  rewrite  it 


%/27tn  ( — 


1 + 


139  \ 


1 2n  + 288n2  " 53  S840n3) ( 1 + 0 ( n > ) 


if  we  wanted  to;  a similar  calculation  is  the  subject  of  exercise  12.)  The 
absolute  error  of  this  approximation  is  0(nn  3 5e  n).  Absolute  error  is  related 
to  the  number  of  correct  decimal  digits  to  the  right  of  the  decimal  point  if 
the  0 term  is  ignored;  relative  error  corresponds  to  the  number  of  correct 
“significant  figures!’ 

We  can  use  truncation  of  power  series  to  prove  the  general  laws 

ln(l  + 0(f(n)))  = O (f (n))  , if  f(n)  A 1;  (9.36) 

e0|f|n!-I  = 1 +0(f(n))  , if  f(n)  = 0(1).  (9.37) 


(Relative  error 
is  nice  for  taking 
reciprocals,  because 
1/(1+  0(e))  - 

1 + 0(e) 
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(Here  we  assume  that  n — > oo;  similar  formulas  hold  for  ln(  1 + 0 (f(x)  )J  and 
eo(f(x))  as  x o.)  For  example,  let  ln(1  + g(n))  be  any  function  belonging 
to  the  left  side  of  (9.36).  Then  there  are  constants  C,  no,  and  c such  that 

|g(n)|  <C  C | f (n) | ^ c < 1 , for  all  n :> no. 

It  follows  that  the  infinite  sum 

ln(l  + g(n»  = g(n)  • (1  loM  + y9(n)2  ) 

converges  for  all  n ^ no,  and  the  parenthesized  series  is  bounded  by  the 
constant  1 -FjcTj^T  ■ ■ ■ • This  proves  (9.36),  and  the  proof  of  (9.37)  is 
similar.  Equations  (9.36)  and  (9.37)  combine  to  give  the  useful  formula 

(1  + 0(f(n)))°'«™  , 1 + 0(f(u)g(n))  . (9.38) 

Problem,  1:  Return  to  the  Wheel  of  Fortune. 

Let’s  try  our  luck  now  at  a few  asymptotic  problems.  In  Chapter  3 we 
derived  equation  (3.13)  for  the  number  of  winning  positions  in  a certain  game: 

W = |N/KJ  + lK2  + |K  — 3,  K=|.^NJ. 

And  we  promised  that  an  asymptotic  version  of  W would  be  derived  in  Chap- 
ter 9.  Well,  here  we  are  in  Chapter  9;  let’s  try  to  estimate  W,  as  N — ) oo. 

The  main  idea  here  is  to  remove  the  floor  brackets,  replacing  K by  N 1/3  + 
0(1).  Then  we  can  go  further  and  write 

K = N1/3(l  + 0(hT1/3))  ; 

this  is  called  “pulling  out  the  large  part!’  (We  will  be  using  this  trick  a lot.) 
Now  we  have 

K2  = N2/3(l  + 0(N~1/3))2 

= N2/3(l  + 0(N~1/3))  = N2/3  + 0(N1/3) 

by  (9.38)  and  (9.26).  Similarly 

[N/KJ  = N1-1/3(l+  0(N-y3))-]  + 0(1) 

= N2/3(l  + 0(NT1/3))  + 0(1)  = N2/3  + 0(N1/3) . 

It  follows  that  the  number  of  winning  positions  is 

W=  N2/3  + 0(N1/3)  + j (N223  + 0 (N 1 /3 ) ) + 0(N1//3)  + 0(1) 

= |n2/3  + o(n1/3)  . 


(9.39) 
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Notice  how  the  0 terms  absorb  one  another  until  only  one  remains;  this  is 
typical,  and  it  illustrates  why  O-notation  is  useful  in  the  middle  of  a formula. 

Problem,  2:  Perturbation  of  Stirling’s  formula. 

Stirling’s  approximation  for  n!  is  undoubtedly  the  most  famous  asymp- 
totic formula  of  all.  We  will  prove  it  later  in  this  chapter;  for  now,  let’s  just 
try  to  get  better  acquainted  with  its  properties.  We  can  write  one  version  of 
the  approximation  in  the  form 


- MW 


, a b ,, 

1 + - + + 0(tW3 


n 


n1 


as  tn  oo, 


(9-40) 


for  certain  constants  a and  b.  Since  this  holds  for  all  large  n,  it  must  also  be 
asymptotically  true  when  ri  is  replaced  by  n - 1: 


( n - 1 ) ! = y/2n(n  — 1 ) (— ^— ) 

x(l  + ^LT  + ^rTF  + °(i"-|r3))  (mi) 

We  know,  of  course,  that  (n  — 1)!=  n!/n;  hence  the  right-hand  side  of  this 
formula  must  simplify  to  the  right-hand  side  of  (9.40),  divided  by  n. 

Let  us  therefore  try  to  simplify  (9.41).  The  first  factor  becomes  tractable 
if  we  pull  out  the  large  part: 


\Jln{n  - 1 ) = s/lim  (1  -n  1)1/2 

= '/2™11  St  8^+0(n~3)) 

Equation  (9.35)  has  been  used  here. 

Similarly  we  have 


a 

__ 

b 


a 

n 

b 


1-u-1)-1  = 

-L  -2 


ci  a 
n n2 


OftT 


(n  — 1 ) 2 

0((n  - ir3)  = 


= -y(l  -n 


n1 


-^  + 0(n-3) 
n2 


0(ttT3(1  — n"1 ) 3)  = 0(n-3). 


The  only  thing  in  (9.41)  that’s  slightly  tricky  to  deal  with  is  the  factor 
(n  — 1 )n~  1 , which  equals 

n"  1 (1  - n'1  y1  ~ nn_1  (1  -n-'  )n(1  + n_1  + n-2  + 0(n~3))  . 
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(We  are  expanding  everything  out  until  we  get  a relative  error  of  0(n”3), 
because  the  relative  error  of  a product  is  the  sum  of  the  relative  errors  of  the 
individual  factors.  All  of  the  0(n”3)  terms  will  coalesce.) 

In  order  to  expand  (1  n-1)n,  we  first  compute  ln(l  n 1 ) and  then 

form  the  exponential,  enln(l-n  ')• 

(1  — n_1  )n  = exp(n ln(l  -tT1)) 

= exp(n(— n”1  - ^n”2  - yn”3  + 0(n”4))) 

= exp(-l  - ±n-1  - In-2  + 0(n”3)) 

= exp(-l)  . exp(— in”1)-  exp(— yn”2)-  exp(0(n”3)) 

= exp(-l)  . (1  - in-1  + |n  2 + 0(n  3)) 

. (1  - in-2  + 0(n-4)).(l+0(n-3)) 

= e 1 ( 1 - ln-1-^n“2  + 0(n-3)). 

Here  we  use  the  notation  expz  instead  of  ez,  since  it  allows  us  to  work  with 
a complicated  exponent  on  the  main  line  of  the  formula  instead  of  in  the 
superscript  position.  We  must  expand  In ( 1 — n”1)  with  absolute  error  0(n”4) 
in  order  to  end  with  a relative  error  of  0(n”3),  because  the  logarithm  is  being 
multiplied  by  n. 

The  right-hand  side  of  (9.41)  has  now  been  reduced  to  y' 27m  times 
Tln--,/en  times  a product  of  several  factors: 

(1  ^ ln-1-|n-2  + 0(n-3)) 

. (1  + n”1  -f  n”2  + 0(n”3)) 

, ( 1 - ln”  -^n”2  + 0(n”3)) 

. (1  + an-1-i-(a-t-b)n”2-|-0(n”3)). 

Multiplying  these  out  and  absorbing  all  asymptotic  terms  into  one  0(n”3) 
yields 

1 + an  1 + (a  + b — yy)n”2  + 0(n”3) . 

Hmmm;  we  were  hoping  to  get  1 + an”1  + bn”2  + 0(u”3),  since  that’s  what 
we  need  to  match  the  right-hand  side  of  (9.40).  Has  something  gone  awry? 
No,  everything  is  fine;  Table  438  tells  us  that  a = yy,  hence  a + b — = b. 

This  perturbation  argument  doesn’t  prove  the  validity  of  Stirling’s  ap- 
proximation, but  it  does  prove  something:  It  proves  that  formula  (9.40)  can- 
not be  valid  unless  a = jj.  If  we  had  replaced  the  0(u”3)  in  (9.40)  by 
CH”3  + 0(n”4)  and  carried  out  our  calculations  to  a relative  error  of  0(n”4), 
we  could  have  deduced  that  b = (This  is  not  the  easiest  way  to  determine 
the  values  of  a and  b,  but  it  works.) 
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Problem  3:  The  nth  prime  number. 

Equation  (9.31)  is  an  asymptotic  formula  for  7t(n),  the  number  of  primes 
that  do  not  exceed  n.  If  we  replace  n by  p = Pn,  the  nth  prime  number,  we 
have  7l(p)  = n;  hence 


n = 


(9-42) 


as  n — > 00.  Let  us  try  to  “solve”  this  equation  for  p;  then  we  will  know  the 
approximate  size  of  the  nth  prime. 

The  first  step  is  to  simplify  the  0 term.  If  we  divide  both  sides  by  p/lnp, 
we  find  that  nlnp/p  — > 1;  hence  p/lnp  = O(n)  and 


°( 


f ) 

(logp)2/ 


(We  have  (logp)-1  ^ (logn)-1  because  p )>  n.) 

The  second  step  is  to  transpose  the  two  sides  of  (9.42),  except  for  the 
0 term.  This  is  legal  because  of  the  general  rule 


an  = bu  +0(f(n))  <=*>  bn  = a,,  +0(f(n))  . (9.43) 


(Each  of  these  equations  follows  from  the  other  if  we  multiply  both  sides 
by  -1  and  then  add  a,  + bn  to  both  sides.)  Hence 

~ = n + Ofr-^— ) = n(1  +0(l/logn))  , 
lnp  Vlogri/ 

and  we  have 


p = nlnp(l  + 0(l/logn))  . (9-44) 

This  is  an  “approximate  recurrence”  for  p = Pn  in  terms  of  itself.  Our  goal 
is  to  change  it  into  an  “approximate  closed  form,”  and  we  can  do  this  by 
unfolding  the  recurrence  asymptotically.  So  let’s  try  to  unfold  (9.44). 

By  taking  logarithms  of  both  sides  we  deduce  that 


lnp  = lnn+lnlnp  + 0(l/logn)  , 


(9.45) 


This  value  can  be  substituted  for  lnp  in  (9.44),  but  we  would  like  to  get  rid 
of  all  p’s  on  the  right  before  making  the  substitution.  Somewhere  along  the 
line,  that  last  p must  disappear;  we  can’t  get  rid  of  it  in  the  normal  way  for 
recurrences,  because  (9.44)  doesn’t  specify  initial  conditions  for  small  p. 

One  way  to  do  the  job  is  to  start  by  proving  the  weaker  result  p = 0(n2). 
This  follows  if  we  square  (9.44)  and  divide  by  pn2, 


n1 


(lnp)2 


(.  + 0(l/logn))  , 


P 
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Get  out  the 
paper  again, 

Boo,  Hiss. 


since  the  right  side  approaches  zero  as  n — > co.  OK,  we  know  that  p = 0(n2); 
therefore  log  p = 0 (log  n)  and  log  log  p = 0 (log  log  n).  We  can  now  conclude 
from  (9.45)  that 

lnp  = Inn  -f  O(loglogn)  ; 

in  fact,  with  this  new  estimate  in  hand  we  can  conclude  that  In  In  p = In  fnn-t 
0 (log  log  n/log  n),  and  (9.45)  now  yields 

lnp  = lntl  + Ininn  + 0(loglogn/logn) 

And  we  can  plug  this  into  the  right-hand  side  of  (9.44),  obtaining 

p = n 1 n n + n 1 n 1 n n + O ( n ) . 


This  is  the  approximate  size  of  the  nth  prime. 

We  can  refine  this  estimate  by  using  a better  approximation  of  n(n)  in 
place  of  (9.42).  The  next  term  of  (9.31)  tells  us  that 


n 


_P_ 

lnp 


P 

(lnp)2 


+ 0 


( 


P 

(logp)3 


(9-46) 


scratch  proceeding  as  before,  we  obtain  the  recurrence 
gang, 

p = nlnp  (1  + (lnp)  ’)  1 (1  + 0(1/logn)2)  , (9-47) 

which  has  a relative  error  of  0(  1 /logn)2  instead  of  0(  1 /logn).  Taking  loga- 
rithms and  retaining  proper  accuracy  (but  not  too  much)  now  yields 


lnp 


lnlnp 


Inn  -(-  lnlnp  + 0(1 /logn) 


In  n ( 1 + 
In  Inn  + 


lnlnp 

Inn  + 

lnlnn 
Inn  + 


0(1 /logn)2); 

/log  log  ny 
V log  n ) 


Finally  we  substitute  these  results  into  (9.47)  and  our  answer  finds  its  way 
out: 


P 


n 


In  Inn 

nlnn  + nlnlnn-n  + n 

Inn 


(9-48) 


For  example,  when  n = 106  this  estimate  comes  to  15631363.8  + 0(n/logn); 
the  millionth  prime  is  actually  15485863.  Exercise  21  shows  that  a still  more 
accurate  approximation  to  Pn  results  if  we  begin  with  a still  more  accurate 
approximation  to  7t(n)  in  place  of  (9.46). 
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Problem  4:  A sum  from  an  old  final  exam. 

When  Concrete  Mathematics  was  first  taught  at  Stanford  University  dur- 
ing the  1970-1971  term,  students  were  asked  for  the  asymptotic  value  of  the 
sum 


b ’ + 1 

n2  + r T n2  + 2 


n2  + n 


(9-49) 


with  an  absolute  error  of  0(ll  7).  Let’s  imagine  that  we’ve  just  been  given 
this  problem  on  a (take-home)  final;  what  is  our  first  instinctive  reaction? 

No,  we  don’t  panic.  Our  first  reaction  is  to  think  big.  If  we  set  n = 
TO100,  say,  and  look  at  the  sum,  we  see  that  it  consists  of  n terms,  each  of 
which  is  slightly  less  than  1/n2;  hence  the  sum  is  slightly  less  than  1/n.  In 
general,  we  can  usually  get  a decent  start  on  an  asymptotic  problem  by  taking 
stock  of  the  situation  and  getting  a ballpark  estimate  of  the  answer. 

Let’s  try  to  improve  the  rough  estimate  by  pulling  out  the  largest  part 
of  each  term.  We  have 


1 /,  k k2  k3  /k4\\ 


-k4 


n2  + k n2(1  + k/n2) 
and  so  it’s  natural  to  try  summing  all  these  approximations: 


1 _ i_  ^ j3 

a + 7 f 


n2 

+ 1 

- n2 

' n4  + 

n6 

— 7 + 0 

n8 

1 

2 

22 

23 

-^  + ° 
n8 

i? 

72 

" Tl2 

n« 

1 

1 

n 

n2 

— g + 0 

n8 

n2 

+ n 

n4  + 

n« 

n _ n(n+  1)  , 
n n2  2n4 

It  looks  as  if  we’re  getting  Sn  = rU1  — 2 + 0 (tT  ^ ) , based  on  the  sums  of 

the  first  two  columns;  but  the  calculations  are  getting  hairy. 

If  we  persevere  in  this  approach,  we  will  ultimately  reach  the  goal;  but 
we  won’t  bother  to  sum  the  other  columns,  for  two  reasons:  First,  the  last 
column  is  going  to  give  us  terms  that  are  0(rU6),  when  n/2  ^ k ^ n,  so  we 
will  have  an  error  of  0(rU5);  that’s  too  big,  and  we  will  have  to  include  yet 
another  column  in  the  expansion.  Could  the  exam-giver  have  been  so  sadistic?  Do  pajamas  have 

We  suspect  that  there  must  be  a better  way.  Second,  there  is  indeed  a much  buttons? 

better  way,  staring  us  right  in  the  face. 
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Namely,  we  know  a closed  form  for  S,:  It’s  just  Hn2+n  — Hn2.  And  we 
know  a good  approximation  for  harmonic  numbers,  so  we  just  apply  it  twice: 

Ht,,„=ln(n2+n)+T  + Ii^T;;r  ,2(n^  (^  ) i 

H”‘  = lnn2+1'+2t?'U^  + 0(^)^ 

Now  we  can  pull  out  large  terms  and  simplify,  as  we  did  when  looking  at 
Stirling’s  approximation.  We  have 

ln(n2  +n)  = Inn2  + ln  (l  + -)  = Inn2  + — - — ^ ^ ; 

v w n 2nl  3nJ 

_J 1 . .1.  1 

n2  + n ~~  n2  n3  + n4  ’ 

1 _ 1 __  2_  jl_ 

(n2  +n)2  n4  n5  n6 

So  there’s  lots  of  helpful  cancellation,  and  we  find 

Sn  = n-1  - In-2  + In-3  - ln~4  + ln~5  - 

_ 1 n-3  + ln-4  _ 1 n-5  + 1 n-6 
+ in-5  - in'6 

plus  terms  that  are  0(n  7).  A bit  of  arithmetic  and  we’re  home  free: 

Sn  = n-1  \nr2  ~ grr3  + |u“4  - ^tw5  + + 0(nr7) . (9.50) 

It  would  be  nice  if  we  could  check  this  answer  numerically,  as  we  did 
when  we  derived  exact  results  in  earlier  chapters.  Asymptotic  formulas  are 
harder  to  verify;  an  arbitrarily  large  constant  may  be  hiding  in  a 0 term, 
so  any  numerical  test  is  inconclusive.  But  in  practice,  we  have  no  reason  to 
believe  that  an  adversary  is  trying  to  trap  us,  so  we  can  assume  that  the 
unknown  O-constants  are  reasonably  small.  With  a pocket  calculator  we  find 
that  S4  = + T8  + T^"*"Io  = 0-2170107;  and  our  asymptotic  estimate  when 

n = 4 comes  to 

10  + lH  + K4  + id  + iHW-M)=  ■•’”■1“' 

If  we  had  made  an  error  of,  say,  in  the  term  for  nr6,  a difference  of  -2-  -J— 
would  have  shown  up  in  the  fifth  decimal  place;  so  our  asymptotic  answer  is 
probably  correct. 


446  ASYMPTOTICS 


Problem  5:  An  infinite  sum. 

We  turn  now  to  an  asymptotic  question  posed  by  Solomon  Golomb  [122]: 
What  is  the  approximate  value  of 

(9'5l) 

where  N,(k)  is  the  number  of  digits  required  to  write  k in  radix  n notation? 

First  let’s  try  again  for  a ballpark  estimate.  The  number  of  digits,  N,(k), 
is  approximately  log,  k = log  k/log  n;  so  the  terms  of  this  sum  are  roughly 
(logn)2/k(log  k)2.  Summing  on  k gives  (log  n)2  l/k(log  k)2,  and  this 

sum  converges  to  a constant  value  because  it  can  be  compared  to  the  integral 

00  dx  L 00  _ i 

, 2 x(lnx)2  1 n x , ln2  ’ 

Therefore  we  expect  Sn  to  be  about  C(logn)2,  for  some  constant  C. 

Hand-wavy  analyses  like  this  are  useful  for  orientation,  but  we  need  better 
estimates  to  solve  the  problem.  One  idea  is  to  express  N„(k)  exactly: 


N,(k)  = |_l°gn  kj  + 1 . (9-52) 

Thus,  for  example,  k has  three  radix  n digits  when  rt2  k < n2,  and  this 
happens  precisely  when  |_logn  kj  = 2.  It  follows  that  N„(k)  > log,  k,  hence 

Sn  = Lks.1  l/kNn(k)2  < 1 + (logn)2  Vk(logk)2. 

Proceeding  as  in  Problem  1,  we  can  try  to  write  N,(k)  = log,,  k -)-  0(  1) 
and  substitute  this  into  the  formula  for  Sn.  The  term  represented  here  by  0 (1) 
is  always  between  0 and  1,  and  it  is  about  1 on  the  average,  so  it  seems  rather 
well-behaved.  But  still,  this  isn’t  a good  enough  approximation  to  tell  us 
about  Sn;  it  gives  us  zero  significant  figures  (that  is,  high  relative  error)  when 
k is  small,  and  these  are  the  terms  that  contribute  the  most  to  the  sum.  We 
need  a different  idea. 

The  key  (as  in  Problem  4)  is  to  use  our  manipulative  skills  to  put  the 
sum  into  a more  tractable  form,  before  we  resort  to  asymptotic  estimates.  We 
can  introduce  a new  variable  of  summation,  m = N,(k): 


= 


L 


m = Nn(k)] 


, km2 

k,m^1 

^ k <nm] 

km2 

k,m^l 


= 

x—  m2  v 


Tlm  — 1 Hnm-I_i 


m>1 
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Into  a Big  Oh. 


This  may  look  worse  than  the  sum  we  began  with,  but  it’s  actually  a step  for- 
ward, because  we  have  very  good  approximations  for  the  harmonic  numbers. 

Still,  we  hold  back  and  try  to  simplify  some  more.  No  need  to  rush  into 
asymptotics.  Summation  by  parts  allows  us  to  group  the  terms  for  each  value 
of  Hnm_i  that  we  need  to  approximate: 

Sn  = L^1-1  (i?  (k+nO  • 

For  example,  Hn;  ] is  multiplied  by  1 /22  and  then  by  -1  /32,  (We  have  used 
the  fact  that  Hno_1=  Ho  = 0.) 

Now  we’re  ready  to  expand  the  harmonic  numbers.  Our  experience  with 
estimating  (n  — 1 )!  has  taught  us  that  it  will  be  easier  to  estimate  Hnk  than 
Hnk_i,  since  the  (nk  1 )’s  will  be  messy;  therefore  we  write 

H-‘-'  = *'  = lllTl1  +Y+  SV  + 0(^k)  " i? 

Our  sum  now  reduces  to 

s"  = L(kton  + Y-i  + 0(i))(j?-(kTTy) 

= (Inn)!!  +y!2  - ^(n)  + 0(l3(n2))  . (9.53) 

There  are  four  easy  pieces  left:  !i,  L2,  ^(rt),  and  !3(n2). 

Let’s  do  the  !3’s  first,  since  I3(n2)  is  the  0 term;  then  we’ll  see  what 
sort  of  error  we’re  getting.  (There’s  no  sense  carrying  out  other  calculations 
with  perfect  accuracy  if  they  will  be  absorbed  into  a 0 anyway.)  This  sum  is 
simply  a power  series, 

13 W = " (k+l)2)X  ^ 

and  the  series  converges  when  x ^ 1 so  we  can  truncate  it  at  any  desired  point. 
If  we  stop  I3(n2)  at  the  term  for  k = 1,  we  get  I3(n2)=  0(n~2);  hence  (9.53) 
has  an  absolute  error  of  0(n~2).  (To  decrease  this  absolute  error,  we  could 
use  a better  approximation  to  Hnk;  but  0(n~2)  is  good  enough  for  now.)  If 
we  truncate  !3(n)  at  the  term  for  k = 2,  we  get 

I3(n)  = In*1  + 0(n-2) ; 

this  is  all  the  accuracy  we  need. 
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We  might  as  well  do  I2  now,  since  it  is  so  easy: 

12  = Jyfe'IkW) 

This  is  the  telescoping  series  (1  - {)  + ( \ — {)  + ({  — yg)  H = 1. 

Finally,  gives  us  the  leading  term  of  Sn,  the  coefficient  of  Inn  in  (9.53): 

Il  = k(i?“  (kw)- 

This  is  (1  - 1)  + (|  - |)  + (|  - t^)  H { + 1 + 1 + ..  • = H&5  = 7t2/6.  (If 

we  hadn’t  applied  summation  by  parts  earlier,  we  would  have  seen  directly 
that  Sn  rj  ^k>1  (In n)/k2,  because  Hnk_]  — Huk-i_i  ~ Inn;  so  summation  by 
parts  didn’t  help  us  to  evaluate  the  leading  term,  although  it  did  make  some 
of  our  other  work  easier.) 

Now  we  have  evaluated  each  of  the  L’s  in  (9.53),  so  we  can  put  everything 
together  and  get  the  answer  to  Golomb’s  problem: 

S„  = ^l„n  + Y-  + + o(+).  (9.54) 

Notice  that  this  grows  more  slowly  than  our  original  hand-wavy  estimate  of 
C(logn)2.  Sometimes  a discrete  sum  fails  to  obey  a continuous  intuition. 

Problem  6:  Big  Phi. 

Near  the  end  of  Chapter  4,  we  observed  that  the  number  of  fractions  in 
the  Farey  series  fFn  is  1 + <D  (n)  , where 

O(n)  = cp(l)  +cp(2)  + • • • + <p(n)  ; 

and  we  showed  in  (4.62)  that 

®(n)  =\Y-  Ln/kJ  L1  + n/kJ  • (9.55) 

k^l 

Let  us  now  try  to  estimate  ®(n)  when  n is  large.  (It  was  sums  like  this  that 
led  Bachmann  to  invent  O-notation  in  the  first  place.) 

Thinking  big  tells  us  that  (P(n)  will  probably  be  proportional  to  n2. 
For  if  the  final  factor  were  just  [n/kJ  instead  of  [1  + n/k],  we  would  have 
| O (tx) I ^ j X.k>i  Ln/kJ2  $ { L^,(n/k)2  = yjn2,  because  the  Mobius  func- 
tion p(k)  is  either  -1,  0,  or  +1.  The  additional  ‘1  + ’ in  that  final  factor 
adds  £Ik>i  M-(k)  [n/kJ ; but  this  is  zero  for  k > n,  so  it  cannot  be  more  than 
rvHn  = 0(nlog  n)  in  absolute  value. 
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This  preliminary  analysis  indicates  that  we’ll  find  it  advantageous  to 

write 


w = ^^(©+on,)2  = i^'K©^oO) 

- 2 y ^kKk)  + Oiniogn) 


This  removes  the  floors;  the  remaining  problem  is  to  evaluate  the  unfloored 
sum  j LLf  ^(k)n2/k2  with  an  accuracy  of  O(nlogn);  in  other  words,  we 
want  to  evaluate  X.v=i  p(k)1/k2  with  an  accuracy  of  0(n_1  logn).  But  that’s 
easy;  we  can  simply  run  the  sum  all  the  way  up  to  k = oo,  because  the  newly 
added  terms  are 


y M-(k) 

2-  k2 

k>n 


■°^'i 

■ fSn ' i»  ■ <> 


We  proved  in  (7.88)  that  £k;jl  |x(k)/kz  = 1/C(z).  Hence  M-(k)/k2  = 

1 j (J2k>1  1 /k2)  = 6/tt2,  and  we  have  our  answer: 


O(n)  = ^-n2  + O(nlogn). 


TiL 


(9-56) 


9.4  TWO  ASYMPTOTIC  TRICKS 

Now  that  we  have  some  facility  with  0 manipulations,  let’s  look  at 
what  we’ve  done  from  a slightly  higher  perspective.  Then  we’ll  have  some 
important  weapons  in  our  asymptotic  arsenal,  when  we  need  to  do  battle 
with  tougher  problems. 

Trick  1:  Boots  trapping. 

When  we  estimated  the  nth  prime  Pn  in  Problem  3 of  Section  9.3,  we 
solved  an  asymptotic  recurrence  of  the  form 

Pn  = nln  Pn  (l  + 0(l/logn))  . 

We  proved  that  Pn  = nln  n + O(n)  by  first  using  the  recurrence  to  show 
the  weaker  result  0(rt2).  This  is  a special  case  of  a general  method  called 
bootstrapping,  in  which  we  solve  a recurrence  asymptotically  by  starting  with 
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a rough  estimate  and  plugging  it  into  the  recurrence;  in  this  way  we  can  often 
derive  better  and  better  estimates,  “pulling  ourselves  up  by  our  bootstraps.” 
Here’s  another  problem  that  illustrates  bootstrapping  nicely:  What  is  the 
asymptotic  value  of  the  coefficient  gu  = [znj  G(z)  in  the  generating  function 

G(z)  = exp(j~  , (9-57) 

k^l  K ' 

as  n -)  00?  If  we  differentiate  this  equation  with  respect  to  Z,  we  find 

00  k-1 

G'(z)  = £ n.gnzn_1  = (T  ~ ) G(z)  ; 

n=0  k^l  K ' 

equating  coefficients  of  zn_1  on  both  sides  gives  the  recurrence 

n9n  = Y (9-58) 

„ , — n.  — k 
0S;k<n 

Our  problem  is  equivalent  to  finding  an  asymptotic  formula  for  the  solution 
to  (9.58),  with  the  initial  condition  go  = 1.  The  first  few  values 


n 

0 

1 2 

3 

4 

5 

6 

yn 

19 

107 

641 

51103 

4 

36 

288 

HOT 

259200 

don’t  reveal  much  of  a pattern,  and  the  integer  sequence  (n!2gn)  doesn’t 
appear  in  Sloane’s  Handbook  [270];  therefore  a closed  form  for  gn  seems  out 
of  the  question,  and  asymptotic  information  is  probably  the  best  we  can  hope 
to  derive. 

Our  first  handle  on  this  problem  is  the  observation  that  0 < gn  <C  1 for 
all  n ^ 0;  this  is  easy  to  prove  by  induction.  So  we  have  a start: 

9n  = 0(1), 


This  equation  can,  in  fact,  be  used  to  “prime  the  pump”  for  a bootstrapping 
operation:  Plugging  it  in  on  the  right  of  (9.58)  yields 

ngn  = 5"  = HnO(l)  = O(logn); 

. ““  n - k 

0<4c<n 

hence  we  have 


- = 


for  n > 1. 
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And  we  can  bootstrap  yet  again: 


ngn 


+ I 

0<k<n 


0((1  + logk)/k) 

n - k 


O(logn) 


n i — k(n  — k) 

0<k<n  v ’ 


n 


+ 


0<k<n 


0 (log  rt) 
n 


- + -Hn_iO(logn)  = — O(logn)2 , 

TV  TV  TV 


obtaining 


- = »(?)' 


(9.59) 


Will  this  go  on  forever?  Perhaps  we’ll  have  gn  = 0(n“ 1 logn)m  for  all  m. 

Actually  no;  we  have  just  reached  a point  of  diminishing  returns.  The 
next  attempt  at  bootstrapping  involves  the  sum 


^ k2(n  — k)2I 


0<k<n 


0<k<n 


/ i i i_ 

Vnk2  + n2k  + n2(n  - 


■ k)> 


_ 1 u(2)  , 

“ nH^-1  + n2  ^ 


which  is  0(n  ’);  so  we  cannot  get  an  estimate  for  gn  that  falls  below  Q(n  2). 

In  fact,  we  now  know  enough  about  gn  to  apply  our  old  trick  of  pulling 
out  the  largest  part: 


ng„  = Y ~+  9k(— !■=---!) 

/ — n — vn  — k n/ 


0£k<n 


0$k<n 


1 v-  1 v-  1 _ 1 

= ->gk-->_gk  + - V . 

■n  a — n z — n „ /—  n - k 

k^O  k^n  OsCkcn 


kgk 


(9-60) 


The  first  sum  here  is  G(l)  = exp(|  + ^ ^ + •••)=  because  G(z) 

converges  for  all  |z|  <C  1.  The  second  sum  is  the  tail  of  the  first;  we  can  get  an 
upper  bound  by  using  (9.59): 

5>  = o(L^)  = o(^£). 

kin  K ' v n / 
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This  last  estimate  follows  because,  for  example. 


y (lOg  k)2  y y (10gTlmf1)2  y-  ( TT1  + 1 ) 2 ( log  U )' 2 

P2  2—  k(k—  1)  ^ 2—  nm 

k>n  mjjl  nm<k$nm:#  m^l 

(Exercise  54  discusses  a more  general  way  to  estimate  such  tails.) 

The  third  sum  in  (g.6o)  is 


by  an  argument  that’s  already  familiar.  So  (9.60)  proves  that 
e™2/ 6 , 

9n  = + 0 (log  n/n)  (9-6i) 

Finally,  we  can  feed  this  formula  back  into  the  recurrence,  bootstrapping  once 
more;  the  result  is 

en2jb 

9n  = — r + 0 (log  n/n3)  (9.62) 


(Exercise  23  peeks  inside  the  remaining  0 term.) 


Trick  2:  Trading  tails. 

We  derived  (9.62)  in  somewhat  the  same  way  we  derived  the  asymptotic 
value  (9.56)  of  4>(n):  In  both  cases  we  started  with  a finite  sum  but  got  an 
asymptotic  value  by  considering  an  infinite  sum.  We  couldn’t  simply  get  the 
infinite  sum  by  introducing  0 into  the  summand;  we  had  to  be  careful  to  use 
one  approach  when  k was  small  and  another  when  k was  large. 

Those  derivations  were  special  cases  of  an  important  three-step  asymp- 
totic summation  method  we  will  now  discuss  in  greater  generality.  Whenever 
we  want  to  estimate  the  value  of  ^ ^ ak  (n),  we  can  try  the  following  approach: 

1 First  break  the  sum  into  two  disjoint  ranges,  Dn  and  Tn.  The  summation 
over  Dn  should  be  the  “dominant”  part,  in  the  sense  that  it  includes 
enough  terms  to  determine  the  significant  digits  of  the  sum,  when  n is 
large.  The  summation  over  the  other  range  Tn  should  be  just  the  “tail” 
end,  which  contributes  little  to  the  overall  total. 


(This  impor- 
tant method  was 

pioneered  by 
Lap/ace  [195  '].) 


2 Find  an  asymptotic  estimate 


Qk(Tt)  = bic(n)  + 0(ck(n)) 


that  is  valid  when  k 6 D,.  The  0 bound  need  not  hold  when  k € Tn. 


9.4  TWO  ASYMPTOTIC  TRICKS  153 


Asymptotics  is 
the  art  of  knowing 
where  to  be  sloppy 
and  where  to  be 
precise. 


3 Now  prove  that  each  of  the  following  three  sums  is  small: 

Ia(n)  = Qk(n);  Ib(n)  = Y_  bk(n)  ; 

k€Tn  k(ETn 

^c(n)  = | ck (n) | • (9-63) 

k£Dn 

If  all  three  steps  can  be  completed  successfully,  we  have  a good  estimate: 

Y_  ak(n)  = Xbk(n)+  0(IQ(n))  + 0(Ib(n))  + 0(Zc(n)). 

k6DnUTn  kgD„UT„ 

Here’s  why.  We  can  “chop  off"  the  tail  of  the  given  sum,  getting  a good 
estimate  in  the  range  Dn  where  a good  estimate  is  necessary: 

Y Qk(n)  = V"  (bk(n)  + 0(ck(rt)))  = Y bk(n)  + 0(lc(n)) 

k6D„  kgDn  k£Dn 

And  we  can  replace  the  tail  with  another  one,  even  though  the  new  tail  might 
be  a terrible  approximation  to  the  old,  because  the  tails  don’t  really  matter: 

y_  Qk(n)  = Y_  (bk(n)  - bk(n)  + ak(n)) 

k€Tn  k6Tn 

= y_  bk(n)  + 0(lb(n))  + 0(lQ(n)). 

k€Tn 

When  we  evaluated  the  sum  in  (9.60),  for  example,  we  had 

Qk(n)  = [0  ^ k<  n]gk/(n  — k) , 
bk(n)  = gk/n, 

Ck(n)  = kgk/n(n  — k) ; 

the  ranges  of  summation  were 

Dn  = {0, 1 , . . . , n - 1 } , Tn  = {n,n  + 1, . . 

and  we  found  that 

Iu(n)  = 0,  Ib(n)  = 0((logn)2/n2)  , Ic(n)  = 0((logn)3/n2)  . 

This  led  to  (9.61). 

Similarly,  when  we  estimated  O(n)  in  (9.55)  we  had 

ak(n)  = pi(k)  Ln/kJ  [1+n/kJ  , bk(n)  = ^(k)n2/k2  . ck(n)  = n/k; 

Dn={1,2 n},  Tn  = {n  + l.n  + 2,...}. 

We  derived  (9.56)  by  observing  that  Ia(rt)  = 0,  Ib(n)  = O(n),  and  Xc  ( tl ) = 
O(nlogn). 
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Here’s  another  example  where  tail  switching  is  effective.  (Unlike  our 
previous  examples,  this  one  illustrates  the  trick  in  its  full  generality,  with 
Za (tx)  ^ 0.)  We  seek  the  asymptotic  value  of 


Also,  horses  switch 
their  tails  when 
feeding  time  ap- 
proaches. 


L 


n 


L 


ln(n  + 2k) 

k! 


The  big  contributions  to  this  sum  occur  when  k is  small,  because  of  the  k!  in 
the  denominator.  In  this  range  we  have 


ln(n  + 2k) 


, 2k 

Inn  H 

n 


(9-64) 


We  can  prove  that  this  estimate  holds  for  0 k < [lg  rtj  , since  the  original 
terms  that  have  been  truncated  with  0 are  bounded  by  the  convergent  series 


L 


2km 

mnm 


L 

m.^3 


2k(m— 3) 
Tlm-3 


1 1 

+ 2 + 4 


(In  this  range,  2k/n  ^ 2ilgnJ  '/n  ^ 2.) 

Therefore  we  can  apply  the  three-step  method  just  described,  with 


ok(n)  = ln(n  + 2k)/k!  , 

bk(n)  = (Inn  + 2k/n  4k/2n2)/k! , 

Ck(n)  = 8k/n3k!i 

Dn  = (0,1,...,  [IgnJ  -1}, 

Tn  = {[lgnj,  [lgnj  +1,...}. 


All  we  have  to  do  is  find  good  bounds  on  the  three  Z’s  in  (9.63),  and  we’ll 
know  that  £k;s0  ak(n)  « Xlk^o  bk(n)- 

The  error  we  have  committed  in  the  dominant  part  of  the  sum,  Lc(n)  = 
XkeD„  8k/n3k!,  is  obviously  bounded  by  X!k>o  8k/n3k!=  e8/n3,  so  it  can  be 
replaced  by  0(rU3).  The  new  tail  error  is 


|£b(n)|  = Y_  bk(n) 

k^L'g  nj 

r-  In  n + 2k  + 4k 

< L - 

L'gnJ 


k! 


Inn  + 2L*«nJ  + 4bgA  4k 

•=  ^ Lit! 

k^O 


Llg  nj ! 


°(liFji) 
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"We  may  not  be  big, 
but  we're  small.” 


Since  |_lg  rtj  ! grows  faster  than  any  power  of  n,  this  minuscule  error  is  over- 
whelmed by  Ec(n)  ==  O(tir^).  The  error  that  comes  from  the  original  tail, 


Iain)  = £ ak(n)  < £ 

k^LlgnJ  k^pgnj 


is  smaller  yet. 

Finally,  it’s  easy  to  sum  Xk>0  ^k(rt)  in  closed  form,  and  we  have  obtained 
the  desired  asymptotic  formula: 


I 

kSO 


ln(n  + 2k) 


, e2  e4 

e In  n H — T 

n 2n2 


O 


0) 


(9-65) 


The  method  we’ve  used  makes  it  clear  that,  in  fact, 


L 

k^O 


ln(n  + 2k 


fn-l 


k! 


= elnn+  Y (-1 


k+1rr  + o 

krC  vrt 


(9-66) 


for  any  fixed  m > 0.  (This  is  a truncation  of  a series  that  diverges  for  all 
fixed  n if  we  let  m — > co.) 

There’s  only  one  flaw  in  our  solution:  We  were  too  cautious.  We  de- 
rived (9  64)  on  the  assumption  that  k < |_lg txJ  , but  exercise  53  proves  that 
the  stated  estimate  is  actually  valid  for  all  values  of  k.  If  we  had  known 
the  stronger  general  result,  we  wouldn’t  have  had  to  use  the  two-tail  trick; 
we  could  have  gone  directly  to  the  final  formula!  But  later  we’ll  encounter 
problems  where  exchange  of  tails  is  the  only  decent  approach  available. 


9.5  EULER’S  SUMMATION  FORMULA 

And  now  for  our  next  trick-which  is,  in  fact,  the  last  important 
technique  that  will  be  discussed  in  this  book-we  turn  to  a general  method  of 
approximating  sums  that  was  first  published  by  Leonhard  Euler  [82]  in  1732. 
(The  idea  is  sometimes  also  associated  with  the  name  of  Colin  Maclaurin, 
a professor  of  mathematics  at  Edinburgh  who  discovered  it  independently  a 
short  time  later  [211,  page  305].) 

Here’s  the  formula: 


rb 


Y f(k)  = f(x)  dx  + Y_  -j^ff(k  1)(x 


a^k<b  Ju 

where  Rm  = (~l)m+1 


k=1 


R , , 


(9-67) 


» O;  (9l68) 

integer  m ^ 1.  k ' 


m! 
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On  the  left  is  a typical  sum  that  we  might  want  to  evaluate.  On  the  right  is 
another  expression  for  that  sum,  involving  integrals  and  derivatives.  If  f(x)  is 
a sufficiently  “smooth”  function,  it  will  have  m derivatives  f(x),  . . . , f,m)  (x), 
and  this  formula  turns  out  to  be  an  identity.  The  right-hand  side  is  often  an 
excellent  approximation  to  the  sum  on  the  left,  in  the  sense  that  the  remain- 
der Rm  is  often  small.  For  example,  we’ll  see  that  Stirling’s  approximation 
for  n!  is  a consequence  of  Euler’s  summation  formula;  so  is  our  asymptotic 
approximation  for  the  harmonic  number  Hn. 

The  numbers  in  (9.67)  are  the  Bernoulli  numbers  that  we  met  in 
Chapter  6;  the  function  Bm({x})  in  (9.68)  is  the  Bernoulli  polynomial  that  we 
met  in  Chapter  7.  The  notation  {x}  stands  for  the  fractional  part  x — [xj , as 
in  Chapter  3.  Euler’s  summation  formula  sort  of  brings  everything  together. 

Let’s  recall  the  values  of  small  Bernoulli  numbers,  since  it’s  always  handy 
to  have  them  listed  near  Euler’s  general  formula: 

Bo  = 1 , B,  — — j , B2  — g , B4  - — jq  > Bg  - , Bg  - — 3Q  , 

B3  = B5  — B7  = B9  = Bn  = . . . — 0, 

Jakob  Bernoulli  discovered  these  numbers  when  studying  the  sums  of  powers 
of  integers,  and  Euler’s  formula  explains  why:  If  we  set  f(x)  = x"1"1,  we  have 
f,ml(x)  = 0;  hence  Rm  = 0,  and  (9.67)  reduces  to 


a$k<b 


V km  1 = — + y ^(m-l)^!xm-k 

L — m ^ — 


k=l 


Bk 

k! 


1 in  / 

-f  (^iBk  fb^-ct^ 

m V k 

k=0  v 


For  example,  when  m = 3 we  have  our  favorite  example  of  summation: 


L k2  = 


(This  is  the  last  time  we  shall  derive  this  famous  formula  in  this  book.) 

Before  we  prove  Euler’s  formula,  let’s  look  at  a high-level  reason  (due 
to  Lagrange  [192])  why  such  a formula  ought  to  exist.  Chapter  2 defines  the 
difference  operator  A and  explains  that  Y_  is  the  inverse  of  A,  just  as  J is  the 
inverse  of  the  derivative  operator  D.  We  can  express  A in  terms  of  D using 
Taylor’s  formula  as  follows: 


f'(x)  ^ f"(x) 


f(x  + e)  = f(x)  + e + 


e2  + 


All  good  things 
must  come  to 
an  end , 


1! 


2! 
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Setting  £ = 1 tells  us  that 


Af(x)  = f(x+  1)  -f(x) 

= f'(x)/l ! + f"(x)/2!  + f"'(x)/3!+ 

= (D/1 ! + D2/2!  + D3/3!  + • ■ • ) f(x)  = (eD-1)f(x).  (g.6g) 

Here  eD  stands  for  the  differential  operation  1 + D/I  ! + D“/2!  + D'!/3!  + . . . . 
Since  A = eD  — 1,  the  inverse  operator  £ = 1/A  should  be  1/(eD  — 1);  and 
we  know  from  Table  337  that  z/(e2  lj  = yk^0  Bkzk/k!  is  a power  series 
involving  Bernoulli  numbers.  Thus 


^2  + ^1  + 
D 1! 


B2 _ B3  7 

2TD+  jTD  + 


k-1 


k£l 


(9-70) 


Applying  this  operator  equation  to  f(x)  and  attaching  limits  yields 


y b f(x)  6x 


(9-71) 


which  is  exactly  Euler’s  summation  formula  (9.67)  without  the  remainder 
term.  (Euler  did  not,  in  fact,  consider  the  remainder,  nor  did  anybody  else 
until  S.  D.  Poisson  [236]  published  an  important  memoir  about  approximate 
summation  in  1823.  The  remainder  term  is  important,  because  the  infinite 
sum  (Bk/k!)f(k_,*(x)|^  often  diverges.  Our  derivation  of  (9.71)  has  been 
purely  formal,  without  regard  to  convergence.) 

Now  let’s  prove  (9.67),  with  the  remainder  included.  It  suffices  to  prove 
the  case  a = 0 and  b = 1 , namely 


f(0)  = 


m T> 

f (x)  dx  - y 'k  f^-’v 

X — k ! 


nn'^Tx)  dx 

0 ml 


because  we  can  then  replace  f(x)  by  f (x  + l)  for  any  integer  1,  getting 


f(D 


l + l m 

f(x]dx  + y 

! k=l 


f(k-l 

k!  k! 


l+l 

l 


l+l 


-in 


Bm(M)  ^ 


m! 


(x)dx 


The  general  formula  (9.67)  is  just  the  sum  of  this  identity  over  the  range 
a <;  l < b,  because  intermediate  terms  telescope  nicely. 

The  proof  when  a = 0 and  b = 1 is  by  induction  on  m,  starting  with 
m=  1 : 


f(0) 


f1  1 

f(x)  dx-  -(f(1)  — f (0))  + 
Jfi  z 


(x-l)f'(x)dx 
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(The  Bernoulli  polynomial  Bm(x)  is  defined  by  the  equation 

BmW  = (o)B^m+(T)B’xm"+--+(m)B-X°  (9'72) 

in  general,  hence  B]  (x)  = x — j in  particular.)  In  other  words,  we  want  to 
prove  that 


f(O)  +f(i)_ 


f(x)  dx  + 


(x  — i )f / (x)  dx  . 


But  this  is  just  a special  case  of  the  formula 


u(x)v(x) 


10  = 


u ( x ) dv(x)  + v(x)  du(x) 


(973) 


for  integration  by  parts,  with  u(x)  = f(x)  and  v(x)  = x j.  Hence  the  case 
tn  = 1 is  easy. 

To  pass  from  m — 1 to  m and  complete  the  induction  when  m > 1 , we 
need  to  show  that  Rm_i  = (Bm/m!)f*m_1'(x)|J  + Rm,  namely  that 


f-r 


A Bm_i  (x 
.0  (ttl  — 1 )! 

m! 


) f'm-Djx)  dx 


-(-I! 


1 Bm(x) 


m! 


f(m'(x)  dx 


This  reduces  to  the  equation 

''  f1 

= m 

( Joo 


(— l)mBmf(m_1'(x) 


Bm-1  (x)f,m  n(x)  dx  + 


Bm(x)f(m| (x)  dx. 


Once  again  (9.73)  applies  to  these  two  integrals,  with  u(x)  = f(m  1 ' (x)  and 
v(x)  = B,(x),  because  the  derivative  of  the  Bernoulli  polynomial  (9.72)  is 


Will  the  authors 
never  get  serious? 


Y_  W(m-k)Bkxm-k  1 

k k ' 

m^(m^1^Bkxm-,-k  = mBm_i(x).  (9.74) 


(The  absorption  identity  (5.7)  was  useful  here.)  Therefore  the  required  for- 
mula will  hold  if  and  only  if 
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In  other  words,  we  need  to  have 


(— 1 )mBm  = Bm(l)  = Bm(0),  for  TTL  > 1- 


(9.75) 


This  is  a bit  embarrassing,  because  B,(0)  is  obviously  equal  to  Bm,  not 
to  (— l)mBm.  But  there’s  no  problem  really,  because  m > 1;  we  know  that 
Bm  is  zero  when  m is  odd.  (Still,  that  was  a close  call.) 

To  complete  the  proof  of  Euler’s  summation  formula  we  need  to  show 
that  B,„(l)  = B,(0),  which  is  the  same  as  saying  that 


for  m > 1. 


But  this  is  just  the  definition  of  Bernoulli  numbers,  (6.79),  so  we’re  done. 
The  identity  B 'm ( x ) = mBm_i  (x)  implies  that 


Bm(x)  dx 

Jo 


Bm+lH)  ~ Bm+i(0) 

m + 1 


and  we  know  now  that  this  integral  is  zero  when  m.  1 . Hence  the  remainder 
term  in  Euler’s  formula. 


Rm  = 


m+1 


-1) 

TTrh. 


Bm({x})f,m)(x)  dx, 


multiplies  f,m)(x)  by  a function  Bm  ({x})  whose  average  value  is  zero.  This 
means  that  Rm  has  a reasonable  chance  of  being  small. 

Let’s  look  more  closely  at  Bm(x)  for  0 x <L  1,  since  Bm(x)  governs  the 
behavior  of  Rm.  Here  are  the  graphs  for  Bm(x)  for  the  first  twelve  values  of  m: 


Although  B3  (x)  through  B9(x)  are  quite  small,  the  Bernoulli  polynomials 
and  numbers  ultimately  get  quite  large.  Fortunately  Rm  has  a compensating 
factor  1 /m!,  which  helps  to  calm  things  down. 
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The  graph  of  B,(x)  begins  to  look  very  much  like  a sine  wave  when 
m 1>  3;  exercise  58  proves  that  B,(x)  can  in  fact  be  well  approximated  by  a 
negative  multiple  of  cos(27tx  j7UTi),  with  relative  error  1/2m. 

In  general,  B4k+i  (x)  is  negative  for  0 < x < 2 and  positive  for  j < x < 1. 
Therefore  its  integral,  B4k+2  (x)/(4k+2),  decreases  for  0 <x  < 2-  and  increases 
for  < x < 1 . Moreover,  we  have 

B4k+1(1  x)  = -B4k+1  (x)  , forO^x^l, 

and  it  follows  that 

B4k+2(1  -x)  = B4k+2(x),  forO  <$1. 

The  constant  term  B4k+2  causes  the  integral  j.!  B4k+2(x)  dx  to  be  zero;  hence 
B4k+2  > 0.  The  integral  of  B 4 ^ + 2f x ) is  B4k+3(x)/(4k+3),  which  must  therefore 
be  positive  when  0 < x < j and  negative  when  \ < x < 1;  furthermore 
B4k+3  (1  x)  = — B4k+3  (x)  , so  B4k+3  (x)  has  the  properties  stated  for  B4k+i  (x), 
but  negated.  Therefore  B4k  ^4  (x)  has  the  properties  stated  for  B4k+2(x),  but 
negated.  Therefore  B4k+s(x)  has  the  properties  stated  for  B4k+i  (x);  we  have 
completed  a cycle  that  establishes  the  stated  properties  inductively  for  all  k. 

According  to  this  analysis,  the  maximum  value  of  B2m(x)  must  occur 
either  at  x = 0 or  at  x = Exercise  17  proves  that 

B2m(i)  = (21~2m  - 1)B2m;  (9-76) 

hence  we  have 


|B2m({x})|  6 |B2in|.  (9.77) 

This  can  be  used  to  establish,  a useful  upper  bound  on  the  remainder  in  Euler’s 
summation  formula,  because  we  know  from  (6.89)  that 


|B2m  _ 2 y 1 

(2m)!  (2n)2m  ^ klm 

Therefore  we  can  rewrite  Euler’s 


= 0((27t)  2m)  , when  m > 0. 
formula  (9.67)  as  follows: 


Y_  w 

a^kcb 


rb 


f(x)  dx 


• 0((2tt)  2~') 


«b 

|f(2m|(x)|  dx. 
Jqi 


(978) 


For  example,  if  f(x)  = ex,  all  derivatives  are  the  same  and  this  formula  tells 
us  that  £Q$k<b  ek  = (eb  c°)(l  - | + B2/2!  + B4/4!  + ■ • • + B2m/(2m)!)  + 
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0((27t)  2m).  Of  course,  we  know  that  this  sum  is  actually  a geometric  series, 
equal  to  (eb  - ea)/(e:  - 1)  = (eb  - ea)  £k?0  Bk/k!. 

If  f(2m)jxj  ^ Q for  a <;  x <;  b,  the  integral  |f(2m)(x)i  dx  is  just 
f (2m-l ) (x) |^ , so  we  have 


|R2ml  ^ 


B2m  f(2m-1W  qb  . 

(2m)!  1 J'a  ’ 


in  other  words,  the  remainder  is  bounded  by  the  magnitude  of  the  final  term 
(the  term  just  before  the  remainder),  in  this  case.  We  can  give  an  even  better 
estimate  if  we  know  that 


f(2m+2)(x)  ^ 0 and  f(2m+4)(x)  ^ 0,  for  a ^ x $ b.  (9-79) 

For  it  turns  out  that  this  implies  the  relation 

R2m  = 9-  (^fflj!f(2m+1)(x)l^  for  some  0 < 0m  < 1;  (9.80) 

in  other  words,  the  remainder  will  then  lie  between  0 and  the  first  discarded 
term  in  (9.78)  -the  term  that  would  follow  the  final  term  if  we  increased  m. 

Here’s  the  proof:  Euler’s  summation  formula  is  valid  for  all  m,  and 
B2m+i  = 0 when  rtl  > 0;  hence  R2m  = R2m+ii  and  the  first  discarded  term 
must  be 


^ 2m  “ ^2m+2  ■ 


We  therefore  want  to  show  that  Rim  lies  between  0 and  R2m  R2m+2i  and 
this  is  true  if  and  only  if  R2m  and  R2m+2  have  opposite  signs.  We  claim  that 

f(2m+2)(x)  0 for  a ^ x ^ b implies  (-l)mR2m  ^ 0.  (9.81) 


This,  together  with  (9.79),  will  prove  that  R2m  and  R2m+2  have  opposite  signs, 
so  the  proof  of  (9.80)  will  be  complete. 

It’s  not  difficult  to  prove  (9.81)  if  we  recall  the  definition  of  R2m+t  and 
the  facts  we  proved  about  the  graph  of  B2m+i  (x).  Namely,  we  have 


^2m  - R 


2m.+  l — 


b B2m+i  ((xj)  (2lu+11 

a (2m  + 1)!  ' 


[xj  dx ' 


and  fi2m+1*(x)  is  increasing  because  its  derivative  f*2m+2'(x)  is  positive.  (More 
precisely,  f l>2m+’  1 (x)  is  nondecreasing  because  its  derivative  is  nonnegative.) 
The  graph  of  B2m+i  ({x})  looks  like  ( ■ 1 ) m+1  times  a sine  wave,  so  it  is  geo- 
metrically obvious  that  the  second  half  of  each  sine  wave  is  more  influential 
than  the  first  half  when  it  is  multiplied  by  an  increasing  function.  This  makes 
( — 1)mR2m+i  0,  as  desired.  Exercise  16  proves  the  result  formally. 
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9.6  FINAL  SUMMATIONS 

Now  comes  the  summing  up,  as  we  prepare  to  conclude  this  book. 
We  will  apply  Euler’s  summation  formula  to  some  interesting  and  important 
examples. 

Summation  1:  This  one  is  too  easy. 

But  first  we  will  consider  an  interesting  unimportant  example,  namely 
a sum  that  we  already  know  how  to  do.  Let’s  see  what  Euler’s  summation 
formula  tells  us  if  we  apply  it  to  the  telescoping  sum 


Sn  = 


L 

1<k<n 


1 

k(k  + 1] 


1 ?lc<n 


1 

k 


1 

k + 1 


n 


It  can’t  hurt  to  embark  on  our  first  serious  application  of  Euler’s  formula  with 
the  asymptotic  equivalent  of  training  wheels. 

We  might  as  well  start  by  writing  the  function  f(x)  = 1 /x(x+  1)  in  partial 
fraction  form. 


fix) 


x + 1 


since  this  makes  it  easier  to  integrate  and  differentiate.  Indeed,  we  have 
f 1 ( x ) =—  1/x2+l/(x  +1)2  andf " ( X)  =2/x3-2/(x  + 1 )3;  in  general 


f(k)(x)  _ (-i)kki( 


xk+'  (x  + 1)k+1 


). 


for  k > 0. 


Furthermore 

rn 


f(x)  dx  = lnx  — ln(x  + 1 ) ^ 


In 


2 n 


n + 1 

Plugging  this  into  the  summation  formula  (9.67)  gives 

•a  m 

2 n 


Sn  = In 


n+1 


(u+r 


where  R™  (u) 


1 


kXm+1  (x+1)m+1. 
For  example,  the  right-hand  side  when  m = 4 is 


dx . 


In 


2n  If  1 1 1 \ \ f \ 1 3 ^ 

n+T“  2ii__n+L.  -2)  - 12  “ (n  +T)2  “ V 


2k 


+ Rm(n) , 


;( 


15 


+ 120  vn4  (n+1)4  16 


) + ^4  (n) 
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This  is  kind  of  a mess;  it  certainly  doesn’t  look  like  the  real  answer  1 — nr1. 
But  let’s  keep  going  anyway,  to  see  what  we’ve  got.  We  know  how  to  expand 
the  right-hand  terms  in  negative  powers  of  n up  to,  say,  0(ri~5): 


1 n -1 

In - -n 

n + 1 

+ In  2 }n  : 

— — r = tT 

1 n~2  + n~ 

n T 1 



2 — 2n 

(n+1)2 

In-4 

4 IL 


1 


n~4  + 0(n~5) 


n~4  + 0(u“ 


(n  + 1)4 

Therefore  the  terms  on  the  right  of  our  approximation  add  up  to 


ln2  +1  + 


16 


128  + 


l+l)n-'  + 


iWK 


+ (~j  + 1 ~ n) 


n 


1 + 12  + 120  120 ) n 4 + ^4  (tv) 


= ln2  + ^— n 1 + R4(n)  + 0(n  5). 


The  coefficients  of  n'2,  n~3,  and  n~4  cancel  nicely,  as  they  should. 

If  all  were  well  with  the  world,  we  would  be  able  to  show  that  R4(n)  is 
asymptotically  small,  maybe  0(n-5),  and  we  would  have  an  approximation 
to  the  sum.  But  we  can’t  possibly  show  this,  because  we  happen  to  know  that 
the  correct  constant  term  is  1 , not  ln2  + ^ (which  is  approximately  0.9978). 
So  R4(n)  is  actually  equal  t°  yjg  — ln2  -)-  0(n'4),  but  Euler’s  summation 
formula  doesn’t  tell  us  this. 

In  other  words,  we  lose. 

One  way  to  try  fixing  things  is  to  notice  that  the  constant  terms  in  the 
approximation  form  a.  pattern,  if  we  let  m get  larger  and  larger: 


ln2-lB,  + HB2  — 


+ 4 • 16^4 


32^5  + 


Perhaps  we  can  show  that  this  series  approaches  1 as  the  number  of  terms 
becomes  infinite?  But  no;  the  Bernoulli  numbers  get  very  large.  For  example, 
^22  = 85|43g13  > 6192;  therefore  | R22 (n)  | will  be  much  larger  than  R4  (n)  |. 
We  lose  totally. 

There  is  a way  out,  however,  and  this  escape  route  will  turn  out  to  be 
important  in  other  applications  of  Euler’s  formula.  The  key  is  to  notice  that 
R4(n)  approaches  a definite  limit  as  n — > 00: 


1 i m R4(n) 

n->oo 


1 

(x  + 1 )5 


R4(oo) 
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The  integral  B4({x})f^m  (x)  dx  will  exist  whenever  f*m*(x)  = 0(x 
x — ) oo,  and  in  this  case  f14*  (x)  surely  qualifies.  Moreover,  we  have 


as 


Ru  (n) 


R4  ( 00  ) + 


dx 


B4({X»  (x5  (x+1)5) 

~ R4 (00]  + o(J  x“&  x)  = R4 (00)  + 0(rT5). 
Thus  we  have  used  Euler’s  summation  formula  to  prove  that 


L 

1 $k<n 


1 


k(k+l) 


= ln2  + — - n 1 + R4  (00)  + 0(rr5) 
= C rT'+Ojtr5) 


for  some  constant  C.  We  do  not  know  what  the  constant  is-some  other 
method  must  be  used  to  establish  it  -but  Euler’s  summation  formula  is  able 
to  let  us  deduce  that  the  constant  exists. 

Suppose  we  had  chosen  a much  larger  value  of  m.  Then  the  same  rea- 
soning would  tell  us  that 

R,(n)  = R,(m)  + Ofu-"1-1), 


and  we  would  have  the  formula 


L 

1 ;fk<n 


k(k+1) 


ri  1 + C2 n 2 + c3n  3 + . . . + cmn 


m 


+ 0(tT 


for  certain  constants  C2,  C3,  • • • • We  know  that  the  c’s  happen  to  be  zero 
in  this  case;  but  let’s  prove  it,  just  to  restore  some  of  our  confidence  (in 
Euler’s  formula  if  not  in  ourselves).  The  term  In  contributes  (-1  )m/m 
to  Cm;  the  term  ( — l)m+1  (Bm/m)n  m contributes  (— 1)m+1Bm/m;  and  the 
term  ( — 1 ) k ( B ^ / k ) (n  4-  1)~k  contributes  ( — 1 )m  (^1')  Bk/k.  Therefore 


f 1 ) m C n 


1 

m 


(1  - Bm  + Bm(1 ) - 1) 


Sure  enough,  it’s  zero,  when  m > 1.  We  have  proved  that 


L 


1 

k(k  + 1 ) 


= C — nT1  + 0(n_T,l_1 ) , 


for  all  m ^ 1 . 


(9-82) 


This  is  not  enough  to  prove  that  the  sum  is  exactly  equal  to  C n 1 ; the 
actual  value  may  be  C — nr1  + 2~n  or  something.  But  Euler’s  summation 
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formula  does  give  us  0(n  m 1 ) for  arbitrarily  large  m,  even  though  we  haven’t 
evaluated  any  remainders  explicitly. 

Summation  1,  again:  Recapitulation  and  generalization. 

Before  we  leave  our  training  wheels,  let’s  review  what  we  just  did  from 
a somewhat  higher  perspective.  We  began  with  a sum 

Sn  = £ f(k) 

1 $k<n 

and  we  used  Euler’s  summation  formula  to  write 

TTl 

Sn=F(n)  F(1  ) + ^(Tic(n)  - Tk(l  ))  + Rm(n) , (9.83) 

k=l 

where  F(x)  was  J f(x)  dx  and  where  (x'l  was  a certain  term  involving  and 
f(k-t)(x)  We  ajso  noticed  that  there  was  a constant  c such  that 

f(m)jx)  = 0(xc  ~m)  as  x — > oo,  for  all  large  m. 

(Namely,  f(k)  was  l/k(k+  1);  F(x)  was  ln(x/(x+  1));  Tk(x)  was  (— l)k+T  x 
(Bk/k)(x~k  - (x  + l)-k);  and  c was  -2.)  For  all  large  enough  values  of  m, 
this  implied  that  the  remainders  had  a small  tail, 

R,!„(n)  = Rm(oo)  - R,(n) 

})  f(m) (x)  -0(nc+1~m).  (9-84) 

n m! 

Therefore  we  were  able  to  conclude  that  there  exists  a constant  C such  that 

m 

Sn  = F(n)  + C + Y_  TkW  “ R>A,(n).  (9.85) 

k=l 

(Notice  that  C nicely  absorbed  the  Tk(  1)  terms,  which  were  a nuisance.) 

We  can  save  ourselves  unnecessary  work  in  future  problems  by  simply 
asserting  the  existence  of  C whenever  Rm(oo)  exists. 

Now  let’s  suppose  that  f(2m+2'(x)  ^ 0 and  f (2m+4 ) (x)  ^ 0 for  1 ^ x ^ n. 
We  have  proved  that  this  implies  a simple  bound  (9.80)  on  the  remainder, 

^2m(Tt.)  = 9m.,n(T2m+2(4l)  — T2m.+2("^  ))  1 

where  0m  n lies  somewhere  between  0 and  1 , But  we  don’t  really  want  bounds 
that  involve  R2m(tl)  and  T2m+2  ( 1);  after  all,  we  got  rid  of  Tk(  1)  when  we 
introduced  the  constant  C.  What  we  really  want  is  a bound  like 


— thm,n"F2m+2(4l) 
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where  0 < 4>m  n < ' > this  will  allow  us  to  conclude  from  (9.85)  that 

m 

Sn  = F(n)  + C + Ti  (n)  + Y_  T2k(n)  + , (9-86) 

k=l 

hence  the  remainder  will  truly  be  between  zero  and  the  first  discarded  term. 

A slight  modification  of  our  previous  argument  will  patch  things  up  per- 
fectly. Let  us  assume  that 

f(2m+2)(x)  ^ 0 and  f,2m+4'(x)  ^ 0,  as  x — > 00.  (9-87) 


The  right-hand  side  of  (9.85)  is  just  like  the  negative  of  the  right-hand  side  of 
Euler’s  summation  formula  (9.67)  with  a = n and  b = oo,  as  far  as  remainder 
terms  are  concerned,  and  successive  remainders  are  generated  by  induction 
on  m.  Therefore  our  previous  argument  can  be  applied. 

Summation  2:  Harmonic  numbers  harmonized. 

Now  that  we’ve  learned  so  much  from  a trivial  (but  safe)  example,  we  can 
readily  do  a nontrivial  one.  Let  us  use  Euler’s  summation  formula  to  derive 
the  approximation  for  Hn  that  we  have  been  claiming  for  some  time. 

In  this  case,  f ( x ) = 1/x.  We  already  know  about  the  integral  and  deriva- 

tives of  f , because  of  Summation  1;  also  f*m'(x)  = 0(x_m_1)  as  x -)  00. 
Therefore  we  can  immediately  plug  into  formula  (9.85): 


Y r = Inn  + C + Btti  1 

l^k  <n 


m 

L 


2kn2k  R2m^ 


for  some  constant  C.  The  sum  on  the  left  is  Hu_i,  not  Hn;  but  it’s  more 
convenient  to  work  with  Hn__i  and  to  add  1 /n  later,  than  to  mess  around  with 
(n  + 1)’s  on  the  right-hand  side.  The  BiTlT1  will  then  become  (Bi  + 1 )n_1  = 
1/(2n).  Let  us  call  the  constant  y instead  of  C,  since  Euler’s  constant  y is, 
in  fact,  defined  to  be  limn_,(X)  (H,  Inn). 

The  remainder  term  can  be  estimated  nicely  by  the  theory  we  developed 
a minute  ago,  because  f'2m*(x)  ^ (2ra)!/x2m+1  ^ 0 for  all  x > 0.  Therefore 
(9.86)  tells  us  that 


Hn  = lnn  + y-1- 


2n 


L 

k=l 


B2k 

2kn2k 


+ 0 


B2m+2 


m,n 


(2m + 2)n 


2in+2  ’ 


(9-88) 


where  0min  is  some  fraction  between  0 and  1 , This  is  the  general  formula 
whose  first  few  terms  are  listed  in  Table  438.  For  example,  when  m = 2 we  get 


J__J I 02, n 

2n  12n2  120n4  252u6 


(9-89) 


Hn  = In  n + y + 
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This  equation,  incidentally,  gives  us  a good  approximation  to  y even  when 
n = 2: 


y = H2  -ln2-  1 + Jg  - 1950  + e = 0.577165 ...  + e , 

where  e is  between  zero  and  ygj^g-  If  we  take  n = 104  and  m = 250,  we  get 
the  value  of  y correct  to  1271  decimal  places,  beginning  thus  [171]: 


y = 0.57721  566490153286060651209008240243...  • (9.90) 


But  Euler’s  constant  appears  also  in  other  formulas  that  allow  it  to  be  eval- 
uated even  more  efficiently  [282], 

Summation  3:  Stirling’s  approximation. 

If  f ( X ) = In  x,  we  have  f 1 ( X ) = 1 /x,  so  we  can  evaluate  the  sum  of 

logarithms  using  almost  the  same  calculations  as  we  did  when  summing  re- 
ciprocals. Euler’s  summation  formula  yields 


Ink 

1 ^k<n 


. Inn 

nlnn-n+d  — — 


+ L 

k=1 


B 


2k 


2k(2k-1)n- 


2k -1 


'+  tPm.n 


B 


2m +2 


(2m+2)  (2m+1  )n2m+1 


where  (j  is  a certain  constant,  “Stirling’s  constant,”  and  0 < (pm  n < 1.  (In  this 
case  f(2ml(x)  is  negative,  not  positive;  but  we  can  still  say  that  the  remainder 
is  governed  by  the  first  discarded  term,  because  we  could  have  started  with 
f ( X ) = — In  x instead  of  f ( X ) = lnx.)  Adding  In  n to  both  sides  gives 


Heisenberg  may 
have  been  here. 


, , 1 In  n 1 

Inn!  = nlnn  - n+ — - + a + — - 
l 12n 


1 


<P2.i 


360u3.3  1 260n5 


(9-9i) 


when  m = 2.  And  we  can  get  the  approximation  in  Table  438  by  taking  ‘exp’ 
of  both  sides.  (The  value  of  ea  turns  out  to  be  \/2n,  but  we  aren’t  quite  ready 
to  derive  that  formula.  In  fact,  Stirling  didn’t  discover  the  closed  form  for  a 
until  several  years  after  de  Moivre  [64]  had  proved  that  the  constant  exists.) 

If  m is  fixed  and  n — ) oo,  the  general  formula  gives  a better  and  better 
approximation  to  Inn!  in  the  sense  of  absolute  error,  hence  it  gives  a better 
and  better  approximation  to  n!  in  the  sense  of  relative  error.  But  if  n is  fixed 
and  m increases,  the  error  bound  |B2m+2|/(2m  + 2)(2m  + 1 )n2m+1  decreases 

to  a certain  point  and  then  begins  to  increase.  Therefore  the  approximation 
reaches  a point  beyond  which  a sort  of  uncertainty  principle  limits  the  amount 
by  which  n!  can  be  approximated. 
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In  Chapter  5,  equation  (5.83),  we  generalized  factorials  to  arbitrary  real  £X 
by  using  a definition 


a! 


lim  f 


n 


a 


n 


— a 


suggested  by  Euler.  Suppose  a is  a large  number;  then 


In  a! 


lim  ( a In  n + In  n! 

n— »oo  \ 


ln(a  + k)')  , 
k=l 


and  Euler’s  summation  formula  can  be  used  with  f(x)  = ln(x+  a)  to  estimate 
this  sum: 


^ln(k+  a)  = Fm(a,n)  - Fm(a,0)  + R2m(a,n)  , 


k=l 


Fm(a,x)  = (x  + a)  ln(x  + a)  - x+V"  — 


R2m(a,n) 


m R 

+ y 

j^-j-2k(2k-  1 )(x  + a)2k_1’ 

n B2m  ({x})  dx 
0 2m  (x  + a)2m ' 


(Here  we  have  used  (9.67)  with  a = 0 and  b = n,  then  added  ln(n  + a)  — 
In  a to  both  sides.)  If  we  subtract  this  approximation  for  X!k=i  bi(k  + a) 
from  Stirling’s  approximation  for  Inn!,  then  add  alnn  and  take  the  limit  as 
n — ) 00,  we  get 


lna! 


, lna 

a In  a - a H — - — h a 


k=l 


B 


2k 


(2k)(2k-1)a2k- 


00  B2m  ({x})  dx 
0 2m  (x+  a)2m  ’ 


because  alnn+nlnn— n+  j Inn—  (n+a)  ln(n.+a)+n—  j ln(n+a)  — > -a  and 
the  other  terms  not  shown  ‘here  tend  to  zero.  Thus  Stirling’s  approximation 
behaves  for  generalized  factorials  (and  for  the  Gamma  function  T(  a + 1)  = a!) 
exactly  as  for  ordinary  factorials. 

Summation  4:  A bell-shaped  summand. 

Let’s  turn  now  to  a sum  that  has  quite  a different  flavor: 

0n  = ^ek2/n  (9-92) 

k 

• + + g — 4/n  + g 1/n.  _j_  , _j_  ^ — 1/n,  ^ -4/n  + 9/ti  _j_  4 . . . 
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This  is  a doubly  infinite  sum,  whose  terms  reach  their  maximum  value  e°  = 1 
when  k = 0.  We  call  it  0n  because  it  is  a power  series  involving  the  quantity 
e~1/n  raised  to  the  p (kjth  power,  where  p(k)  is  a polynomial  of  degree  2;  such 
power  series  are  traditionally  called  “theta  functions!’  If  n = 1 01 00 , we  have 

2 f e 01  « 0,99005,  when  k = 1049; 
e k/n  « < e 1 % 0,36788,  when  k = TO50; 

l e~100  < 1(T43,  when  k = 1051. 

So  the  summand  stays  very  near  1 until  k gets  up  to  about  y/n,  when  it 
drops  off  and  stays  very  near  zero.  We  can  guess  that  0n  will  be  proportional 
to  y/n.  Here  is  a graph  of  e~k  /n  when  n = 10: 


Larger  values  of  n just  stretch  the  graph  horizontally  by  a factor  of  y/n. 

We  can  estimate  0n  by  letting  f(x)  = e~x  ■/n  and  taking  a = — oo,  b = 
Toe  in  Euler’s  summation  formula.  (If  infinities  seem  too  scary,  let  a = -A 
and  b = TB,  then  take  limits  as  A,  B — > oo.)  The  integral  of  f(x)  is 


dx  = -/a 


+oo 


e u du  = Vn  C , 


if  we  replace  x by  Uy/n.  The  value  of  J ^ e 11 2 du  is  well  known,  but  we’ll 
call  it  C for  now  and  come  back  to  it  after  we  have  finished  plugging  into 
Euler’s  summation  formula. 

The  next  thing  we  need  to  know  is  the  sequence  of  derivatives  f’(x), 
f " ( X) , , , , , and  for  this  purpose  it’s  convenient  to  set 


f(x)  = g (x/Vu ) , g(  x)  = e-x2 


Then  the  chain  rule  of  calculus  says  that 

df(x)  _ d_g(-y_)  dy.  y _ _x_ 

dx  - dy  dx  ’ y/n  ’ 

and  this  is  the  same  as  saying  that 


By  induction  we  have 

f|k)(x)  = n k/2g(lc)(x/v/n) 
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For  example,  we  have  g’(x)  = — 2xe  xl  and  g’’(x)  = (4x2  — 2)e  x2;  hence 


It’s  easier  to  see  what’s  going  on  if  we  work  with  the  simpler  function  g(x). 

We  don’t  have  to  evaluate  the  derivatives  of  g(x)  exactly,  because  we’re 
only  going  to  be  concerned  about  the  limiting  values  when  x = ±00.  And  for 
this  purpose  it  suffices  to  notice  that  every  derivative  of  g(x)  is  e~x  times  a 
polynomial  in  x: 

g*k,(x)  = P|<(x)e~x  , where  is  a polynomial  of  degree  k. 

This  follows  by  induction. 

The  negative  exponential  e~x  goes  to  zero  much  faster  than  P|<(x)  goes 
to  infinity,  when  x — > ±oo,  so  we  have 

f(k)(+oo)  = f(k)(  — oo ) = 0 


for  all  k >0.  Therefore  all  of  the  terms 


+oo 
— oo 


vanish,  and  we  are  left  with  the  term  from  J f(x) 


dx  and  the  remainder: 


- Cy/n  -T 

- CVtx  + 

- Ci/n  + 0 (n 


(-1) 

m+1 

■+oo 

Bm({x})  ^.(m) 

x)  dx 

• 

— oo 

m! 

1-1 

jm+1 

*+oo 

Bm(M)  (m; 

'(  X 

nm/2 

— oo 

m!  9 

Vv^ 

(-1 

jm+1 

•+oo 

Bm({iiVn}) 

P 

n(m 

-l)/2  _ 

— oo 

m! 

* m 

dx 


(x  = Ui/n) 


1 — rn. ) / 2 'i 


The  0 estimate  here  follows  since  | Bm  ({U\/if })  is  bounded  and  the  integral 
J+oo  |P(U)  |e  u2  du  exists  whenever  P is  a polynomial.  (The  constant  implied 
by  this  0 depends  on  m.) 

We  have  proved  that  0a  = C-y/n  + 0(n~M),  for  arbitrarily  large  M;  the 
difference  between  0n  and  Cv/n  is  “exponentially  small!’  Let  us  therefore 
determine  the  constant  C that  plays  such  a big  role  in  the  value  of  0,. 

One  way  to  determine  C is  to  look  the  integral  up  in  a table;  but  we 
prefer  to  know  how  the  value  can  be  derived,  so  that  we  can  do  integrals  even 
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when  they  haven’t  been  tabulated.  Elementary  calculus  suffices  to  evaluate  C 
if  we  are  clever  enough  to  look  at  the  double  integral 

-foe  -foe  -hoc  -hoc 

C2  = e~x  dx  e~y  dy  = e^x  +y  * dxdy. 

J-oc  J— oc  J-oc  J-oc 

Converting  to  polar  coordinates  gives 


So  C = yjn.  The  fact  that  x2  + y2  = r2  is  the  equation  of  a circle  whose 
circumference  is  27tr  somehow  explains  why  71  gets  into  the  act. 

Another  way  to  evaluate  C is  to  replace  x by  \/t  and  dx  by  Jf  1/2  dt: 

-hoc  poo  poo 

C = I e~x  dx  = rx  dx  - t_1/2e_tdt 
J -00  Jo  Jo 

This  integral  equals  rQ),  since  r(a)  = t“  1 e 1 dt  according  to  (5.84). 
Therefore  we  have  demonstrated  that  rQ)  = \pk. 

Our  final  formula,  then,  is 

0n  = Y.  e ^/n  = v/^tr+  0(n“M)  , for  all  fixed  M.  (9.93) 

k 

The  constant  in  the  0 depends  on  M;  that’s  why  we  say  that  M is  “fixed!’ 
When  n = 2,  for  example,  the  infinite  sum  02  is  equal  to  2.506628288; 
this  is  already  an  excellent  approximation  to  y/2n  = 2.506628275,  even  though 
n is  quite  small.  The  value  of  0ioo  agrees  with  IO1/7T  to  427  decimal  places! 
Exercise  59  uses  advanced  methods  to  derive  a rapidly  convergent  series 
for  0n;  it  turns  out  that 

0n/V7Tn  = 1 + 2e~nnl  + 0(  e~4n?r! ) - (9.94) 

Summation  5:  The  clincher. 

Now  we  will  do  one  last  sum,  which  will  turn  out  to  tell  us  the  value 
of  Stirling’s  constant  ff.  This  last  sum  also  illustrates  many  of  the  other 
techniques  of  this  last  chapter  (and  of  this  whole  book),  so  it  will  be  a fitting 
way  for  us  to  conclude  our  explorations  of  Concrete  Mathematics. 
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The  final  task  seems  almost  absurdly  easy:  We  will  try  to  find  the  asymp- 
totic value  of 


by  using  Euler’s  summation,  formula. 

This  is  another  case  where  we  already  know  the  answer  (right?);  but 
it’s  always  interesting  to  try  new  methods  on  old  problems,  so  that  we  can 
compare  facts  and  maybe  discover  something  new. 

So  we  think  big  and  realize  that  the  main  contribution  to  A,  comes 
from  the  middle  terms,  near  k = n.  It’s  almost  always  a good  idea  to  choose 
notation  so  that  the  biggest  contribution  to  a sum  occurs  near  k = 0,  because 
we  can  then  use  the  tail-exchange  trick  to  get  rid  of  terms  that  have  large  |]c|. 
Therefore  we  replace  k by  rt  + k: 


A 


n 


y(  2M  _ (2n)! 

f-V  n + \)  Z_(n  + k)!(n-k)!’ 


Things  are  looking  reasonably  good,  since  we  know  to  approximate  (n  ± k)! 
when  n,  is  large  and  k is  small. 

Now  we  want  to  carry  out  the  three-step  procedure  associated  with  the 
tail-exchange  trick.  Namely,  we  want  to  write 


(2n)! 

(n  + k) ! (n-k) ! 


Qk(n)=  bk(n)+0(ck(n)),  forkeDn, 


so  that  we  can  obtain  the  estimate 


^bk(u)  + o(^ak(u))  + o(^bk(n))+  }_  0(ck(n)) . 
k£Dn  k.£Dn  k(iDn 


Let  us  therefore  try  to  estimate  ( in  the  region  where  |k|  is  small.  We 
could  use  Stirling’s  approximation  as  it  appears  in  Table  438,  but  it’s  easier 
to  work  with  the  logarithmic  equivalent  in  (9.91): 


In  ak(n)  = ln(2n)!  ln(n  -|-  k)!  — ln(n  — k)! 

= 2nln2n  — 2n  + \ ln2n  + a + 0(rt_1 ) 

- (n+k)  ln(n+k)  + ri  + k - 1 ln(n+k)  - a + 0(  (n+k)-1) 

- (n-k)  ln(n-k)  + n k 1 ln(n-k)  — a + 0(  (n-k))‘)  . 

(9-95) 


We  want  to  convert  this  to  a nice,  simple  0 estimate. 

The  tail-exchange  method  allows  us  to  work  with  estimates  that  are  valid 
only  when  k is  in  the  “dominant”  set  D,.  But  how  should  we  define  D,? 
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Actually  I'm  not 
into  dominance. 


We  have  to  make  Dn  small  enough  that  we  can  make  a good  estimate;  for 
example,  we  had  better  not  let  k get  near  n,  or  the  term  0((n  — k)_1)  in 
(9.95)  will  blow  up.  Yet  Dn  must  be  large  enough  that  the  tail  terms  (the 
terms  with  k Dn)  are  negligibly  small  compared  with  the  overall  sum.  Trial 
and  error  is  usually  necessary  to  find  an  appropriate  set  Dn;  in  this  problem 
the  calculations  we  are  about  to  make  will  show  that  it’s  wise  to  define  things 
as  follows: 

k e Dn  |k|  <:  n1/2+e.  (9.96) 

Here  e is  a small  positive  constant  that  we  can  choose  later,  after  we  get  to 
know  the  territory.  (Our  0 estimates  will  depend  on  the  value  of  e.)  Equation 
(9.95)  now  reduces  to 

lnaijn)  = (2n  + 5)  In 2 - cr  - j Inn  + 0(n  ') 

- (n+k+2)  ln(l+k/n)  - (n-k+j)  ln(1-k/n) . (9.97) 

(We  have  pulled  out  the  large  parts  of  the  logarithms,  writing 

ln(n±k)=  Inn  + ln(  1 & k/n ) , 

and  this  has  made  a lot  of  Inn.  terms  cancel  out.) 

Now  we  need  to  expand  the  terms  ln(l  ± k/n)  asymptotically,  until  we 
have  an  error  term  that  approaches  zero  as  n — ) 00.  We  are  multiplying 
ln(  1 ± k/n)  by  (n  ± k+  7),  so  we  should  expand  the  logarithm  until  we  reach 
o(n~’),  using  the  assumption  that  |k|  n’^2+e: 

Multiplication  by  n ± k + j yields 

±k-^+—  + 0(n~1/2+3e). 

2n  n J 

plus  other  terms  that  are  absorbed  in  the  0(n~’/2+3e).  So  (9.97)  becomes 

lnak(n)  = (2n  + '-)  In 2 - a - j Inn  - k2/n  + 0(n~1/2+3e) . 

Taking  exponentials,  we  have 

22n+l/2 

Qk(n)  “ e~k/U(1  +°(n~'/2+3e))  • (9-98) 
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This  is  our  approximation,  with 

t2ti+1  /2  , 

bk(n)  = , ck(n)  = 22nrT1+3e  e^. 

Notice  that  k enters  bk(n)  and  ck(n)  in  a very  simple  way.  We’re  in  luck, 
because  we  will  be  summing  over  k. 

The  tail-exchange  trick  tells  us  that  ok(n)  will  be  approximately 
X,k  bk(n)  if  we  have  done  a good  job  of  estimation.  Let  us  therefore  evaluate 


]T  bk(n)  = 


22n+l/2 

k- 

22U+1/2 


I 


,-k  711 


e°y/n 


e-  = ^^(l+0[n-M)) 


(Another  stroke  of  luck:  We  get  to  use  the  sum  0n  from  the  previous  exam- 
ple.) This  is  encouraging,  because  we  know  that  the  original  sum  is  actually 


What  an  amazing 
coincidence. 


A,  = Lfk)  = (1  +1)2n  = 22n . 


Therefore  it  looks  as  if  we  will  have  e°  = y/ln,  as  advertised. 

But  there’s  a catch:  We  still  need  to  prove  that  our  estimates  are  good 
enough.  So  let’s  look  first  at  the  error  contributed  by  ck(n): 

Ic(n)  = Y_  22nrt  1+3ee~k2/n.  < 22nn-1+3£0n  = 0(22nn^+3e). 

kl^n1  /2+c 

Good;  this  is  asymptotically  smaller  than  the  previous  sum,  if  3e  < y. 

Next  we  must  check  the  tails.  We  have 

Y_  e~k2/n  < exp(— |_n,/2+eJ2/n)  (1  + e~1/n  + e'2/n  + . ) 
k > n 1 / 2 4 c 

= 0(e-"2E)  . O(n), 

which  is  0(n-M)  for  all  M;  so  bk(n)  is  asymptotically  negligible.  (We 

chose  the  cutoff  at  -^1  /2+e  jus^  so  ^at  e-k2/n  woui(j  be  exponentially  small 
outside  of  D,.  Other  choices  like  n'/2  i0gn  would  have  been  good  enough 
too,  and  the  resulting  estimates  would  have  been  slightly  sharper,  but  the 
formulas  would  have  come  out  more  complicated.  We  need  not  make  the 
strongest  possible  estimates,  since  our  main  goal  is  to  establish  the  value  of 
the  constant  o.)  Similarly,  the  other  tail 


I'm  tired  of  getting 
to  the  end  of  long, 
hard  books  and  not 
even  getting  a word 
of  good  wishes  from 
the  author.  It  would 
be  nice  to  read  a 
“thanks  for  reading 
this,  hope  it  comes 
in  handy,”  instead 
of  just  running  into 
a hard,  cold,  card- 
board cover  at  the 
end  of  a long,  dry 
proof  You  know? 
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is  bounded  by  2n  times  its  largest  term,  which  occurs  at  the  cutoff  point  k ~ 
ni/2+e  -phis  term  is  known  to  be  approximately  b|<  (n),  which  is  exponentially 
small  compared  with  A,;  and  an  exponentially  small  multiplier  wipes  out  the 
factor  of  2n. 

Thus  we  have  successfully  applied  the  tail-exchange  trick  to  prove  the 
estimate 

22n  - Z_(2™)  = ^22n  + 0(22nn  ^+3e),  if  0 < e < i.  (9.99) 

k ' ' 

Thanks  for  reading  We  may  choose  e = | and  conclude  that 

this,  hope  it  come 

in  handy.  = 1 ln  2tt  . 

-Theauthors  z 

QED. 

Exercises 

Warmups 

1 Prove  or  disprove:  If  f i (n)  -<  gi  (n)  and  f2  (n)  A g2  (n),  then  we  have 
fi  (n)+  f2(rt)-<gi  (n)+  g2(n). 

2 Which  function  grows  faster: 
a n(lnn)  o r (lnn)n? 

b n(lnlnlnn)  or  (Inn)!? 
c (n ! ) ! or  ((n  - 1)!)!  (n -1)!n!? 
d FfHni  or  H?n? 

3 What’s  wrong  with  the  following  argument?  “Since  n = O(n)  and  2n  = 

O(n)  and  so  on,  we  have  kn  = ^k=1  O(n)  = 0(n2).” 

4 Give  an  example  of  a valid  equation  that  has  O-notation  on  the  left  but 
not  on  the  right.  (Do  not  use  the  trick  of  multiplying  by  zero;  that’s  too 
easy.)  Hint:  Consider  taking  limits. 

5 Prove  or  disprove:  0(f(n)  + g(n))  = f(n)  + 0(g(n)),  if  f(n)  and  g(n) 
are  positive  for  all  n.  (Compare  with  (g.27).) 

6 Multiply  (lnn  + y + 0(l/n))  by  (u  + )),  and  express  your  answer 

in  O-notation. 

7 Estimate  ^k>0  e"k^n  with  absolute  error  0(n_1 ). 

Basics 

8 Give  an  example  of  functions  f(n)  and  g(n)  such  that  none  of  the  three 
relations  f(n)  y g(n),  f(n)  y g(n),  f(n)  x g(n)  is  valid,  although  f(n) 
and  g(n)  both  increase  monotonically  to  00. 
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9 Prove  (9.22)  rigorously  by  showing  that  the  left  side  is  a subset  of  the 
right  side,  according  to  the  set-of-functions  definition  of  0. 

10  Prove  or  disprove:  cos  O(x)  = 1 + 0 ( ) for  all  real  x. 

11  Prove  or  disprove:  0(x+y)2  = 0(x2)  + 0(y2). 

12  Prove  that 

1+  — + 0(n~2)  = (1  + -)(!+  0(n  2))  , 
n n/ 

asn-)oo. 

13  Evaluate  (n  + 2 + 0(rU  1 1)"  with  relative  error  0(n  '). 

14  Prove  that  (n  + a)n+|3  □ = nn+^e“(l  + a(|3  - ^ajn-1  + 0(rr2)). 

15  Give  an  asymptotic  formula  for  the  “middle”  trinomial  coefficient  (n^n), 
correct  to  relative  error  0(ttT3). 

16  Show  that  if  B(1  — x)  = -B(x)  ^ 0 for  0 < x < f , we  have 

■ b 

B({x})  f (x)  dx  3 0 

Jq 

if  we  assume  also  that  f’(x)  ^ 0 for  a <C  x b. 

17  Use  generating  functions  to  show  that  Bm(^)=  (21~m  — 1 ) Bm,  for  all 

m 0. 

18  Find  relahve  error  0(rU1//4),  when  a > 0. 

Homework  exercises 

19  Use  a computer  to  compare  the  left  and  right  sides  of  the  approximations 
in  Table  438,  when  n = 10,  Z = a = 0.1,  and  0(f(n))  = 0(f(z))  = 0. 

20  Prove  or  disprove  the  following  estimates,  as  n -)  00: 

a °(  (ssssf ) = °<^2)- 

b e(1+on/n))2  = e + 0(l/n). 

c n!  = 0(((1  - l/n]nn)n)  . 

21  Equation  (9.48)  gives  the  nth  prime  with  relative  error  0(logn)~2.  Im- 
prove the  relative  error  to  0(logn)~3  by  starting  with  another  term  of 
(9.31)  in  (9.46). 

22  Improve  (9.54)  to  0(rU3). 

23  Push  the  approximation  (9.62)  further,  getting  absolute  error  0(n'3). 
Hint:  Let  gn  = c/(rt  + 1)  (n  + 2)  + h.n;  what  recurrence  does  h.n  satisfy? 
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24  Suppose  a,  = 0(f(n))  and  bn  = 0(f(n)).  Prove  or  disprove  that  the 

convolution  akt>n-k  is  also  O (f  (n.)) , in  the  following  cases: 

a f(n)  = n_“,  a > 1. 

b f(n)  = cTn,  a > 1. 

25  Prove  (9.1)  and  (9.2),  with  which  we  opened  this  chapter. 

26  Equation  (9.91)  shows  how  to  evaluate  In  10!  with  an  absolute  error  < 

126000000 ' Therefore  if  we  take  exponentials,  we  get  10!  with  a relative 
error  that  is  less  than  gVU6000000  1 < 1 0~8.  (In  fact,  the  approximation 

gives  3628799.9714.)  If  we  now  round  to  the  nearest  integer,  knowing  that 
10!  is  an  integer,  we  get  an  exact  result. 

Is  it  always  possible  to  calculate  n!  in  a similar  way,  if  enough  terms  of 
Stirling’s  approximation  are  computed?  Estimate  the  value  of  m that 
gives  the  best  approximation  to  Inn!,  when  n is  a fixed  (large)  integer. 
Compare  the  absolute  error  in  this  approximation  with  n!  itself. 

27  Use  Euler’s  summation  formula  to  find  the  asymptotic  value  of  HSl  = 

ka,  where  a is  any  fixed  real  number.  (Your  answer  may  involve  a 
constant  that  you  do  not  know  in  closed  form.) 

28  Exercise  5.13  defines  the  hyperfactorial  function  Qn  = 1 '22  . . . rin.  Find 
the  asymptotic  value  of  Qtl  with  relative  error  0(n  ’ ).  (Your  answer 
may  involve  a constant  that  you  do  not  know  in  closed  form.) 

29  Estimate  the  function  l’/1  2}^  . . . n1^n  as  in  the  previous  exercise. 

30  Find  the  asymptotic  value  of  £lk>o  ^ with  absolute  error  0(rU3), 
when  1 is  a fixed  nonnegative  integer. 

31  Evaluate  ^k>0  1 /(ck  + Cm)  with  absolute  error  0((U3m),  when  c > 1 and 
m is  a positive  iinteger. 

Exam  problems 

32  Evaluate  eHl'+H"  with  absolute  error  C^rU1). 

33  Evaluate  Y.v^o  (k)Alk  whh  absolute  error  0(rU3). 

34  Determine  values  A through  F such  that  ( 1 -)-  1 /n)nHn  is 

An  + B lnm  2 + Clnn  + D H 1 |-0(n  ). 

n n 

35  Evaluate  ]^k=i  1 /kHk  with  absolute  error  0(  1). 

36  Evaluate  Sn  = i_k=i  V(n2  + k2)  with  absolute  error  0(rU5). 

37  Evaluate  Xlk=i  ^ n mod  k)  with  absolute  error  O(nlogn). 

38  Evaluate  ^k>o  kk  (k)  with  relative  error  0 (rU1  ) • 
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39  Evaluate  ^0<k<nln(n  - k)(lnn)Vk!  with  absolute  error  0(ri  1).  Hint: 

Show  that  the  terms  for  k />  10 Inn  are  negligible. 

40  Let  m be  a (fixed)  positive  integer.  Evaluate  X.k=i  l)kH™  with  abso- 
lute error  0(1). 

41  Evaluate  the  “Fibonacci  factorial”  ]""[k=1  with  relative  error  0(n_1] 
or  better.  Your  answer  may  involve  a constant  whose  value  you  do  not 
know  in  closed  form. 

42  Let  a be  a constant  in  the  range  0 < a < j.  We’ve  seen  in  previous 

chapters  that  there  is  no  general  closed  form  for  the  sum  X.k<an  (£)• 
Show  that  there  is,  however,  an  asymptotic  formula 


2nH(a)-j-lgn+0(1) 


where  H(a)  = alg±  + (1  -a)lg(-y^).  Hint:  Show  that  (k^)  < p^(k) 
for  0 < k ^ an. 

43  Show  that  Cn,  the  number  of  ways  to  change  n cents  (as  considered  in 
Chapter  7)  is  asymptotically  cn4  + 0(n3)  for  some  constant  c.  What  is 
that  constant? 


44  Prove  that 


= x1/2 


‘1/2' 

1/2 


,-V2 


1/2' 

-1/2 


+ x 


-3/2 


■ 1/2  ' 

-3/2 


+ 0(x 


-5/2 1 


as  x — ) oo.  (Recall  the  definition  x—  = x!/(x  j)!  in  (5.88),  and  the 

definition  of  generalized  Stirling  numbers  in  Table  258.) 

45  Let  a be  an  irrational  number  between  0 and  1.  Chapter  3 discusses  the 
quantity  D(  a,  n),  which  measures  the  maximum  discrepancy  by  which 
the  fractional  parts  {ka}  for  0 ^ k < n deviate  from  a uniform  distribu- 
tion. The  recurrence 


D(a,n)  6 D({a  1},  |_an|)  + a 1 + 2 
was  proved  in  (3.31);  we  also  have  the  obvious  bounds 


0 <;  D(a,n)  <5  n. 

Prove  that  limn_+00  D(  a,  n)/n  = 0.  Hint:  Chapter  6 discusses  continued 
fractions. 
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46  Show  that  the  Bell  number  bn  = e 1 Y.k>o  kn/k!  of  exercise  7.15  is 
asymptotically  equal  to 

m(n)nem(n)^n'1/2/\/hm, 

where  m(n)  In  m(n)  = n — 1,  and  estimate  the  relative  error  in  this 
approximation. 

47  Let  m be  an  integer  + 2.  Analyze  the  two  sums 

n n 

J2LlogmuJ  and  ^riogmnl  ; 
k=l  k=1 

which  is  asymptotically  closer  to  log,,,  n!  ? 

48  Consider  a table  of  the  harmonic  numbers  Hk  for  1 <C  k ^ n in  decimal 
notation.  The  kth  entry  Hk  has  been  correctly  rounded  to  dk  significant 
digits,  where  dk  is  just  large  enough  to  distinguish  this  value  from  the 
values  of  Hk-i  and  Hk+i . For  example,  here  is  an  extract  from  the  table, 
showing  five  entries  where  Hk  passes  10: 


k 

Hk 

Hk 

dk 

1 2 36  4 

9. 99980041- 

9.9998 

5 

1 2 36  5 

9.99988128+ 

9.9999 

5 

1 2 36  6 

9. 99996215- 

9.99996 

6 

1 2 36  7 

10.0  00  0 4 3 0 1- 

10.0000 

6 

1 2 36  8 

10.00012386+ 

10.0001 

6 

Estimate  the  total  number  of  digits  in  the  table,  Y.k-1  dk,  with  an  abso- 
lute error  of  0 (n). 

49  In  Chapter  6 we  considered  the  tale  of  a worm  that  reaches  the  end  of  a 
stretching  band  after  n seconds,  where  Hn_i  <100  <1  H,.  Prove  that  if 
n is  a positive  integer  such  that  Hn_i  < a + H„  then 

|_ea_Tj  ^ n ^ \ea-y~\  . 

50  Venture  capitalists  in  Silicon  Valley  are  being  offered  a deal  giving  them 
a chance  for  an  exponential  payoff  on  their  investments:  For  an  n mil- 
lion dollar  investment,  where  n ;>  2,  the  GKP  consortium  promises  to 
pay  up  to  N million  dollars  after  one  year,  where  N = 10n.  Of  course 
there’s  some  risk;  the  actual  deal  is  that  GKP  pays  k million  dollars  with 
probability  1/  (k2H^2i ),  for  each  integer  k in  the  range  1 k N.  (All 
payments  are  in  megabucks,  that  is,  in  exact  multiples  of  $1,000,000;  the 
payoff  is  determined  by  a truly  random  process.)  Notice  that  an  investor 
always  gets  at  least  a million  dollars  back. 
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a What  is  the  asymptotic  expected  return  after  one  year,  if  ri  million 
dollars  are  invested?  (In  other  words,  what  is  the  mean  value  of  the 
payment?)  Your  answer  should  be  correct  within  an  absolute  error  I once  tamed 
of  0(1 0~n)  dollars.  O(10“>l  I ars, 

b What  is  the  asymptotic  probability  that  you  make  a profit,  if  you 
invest  n million?  (In  other  words,  what  is  the  chance  that  you  get 
back  more  than  you  put  in?)  Your  answer  here  should  be  correct 
within  an  absolute  error  of  Ofn  3). 

Bonus  problems 

51  Prove  or  disprove:  J^°  0(x~2)  dx  = 0(u_1 ) as  n — > oo. 

52  Show  that  there  exists  a power  series  A(z)  = anzni  convergent  for 
all  complex  z,  such  that 

A(n)  >-  nn"l 


53  Prove  that  if  f(x)  is  a function  whose  derivatives  satisfy 

f'(x)6  0,  - f " ( x ) <C0,  fw(x)$0,  ..)  (-l)mf,m+,)(x)  ^0 

for  all  x )>  0,  then  we  have 

f'(0)  f(Tn-1)(01  , 

fix)  = f ( 0)  + _i^x+...+  + 0(xm) , for  x £0. 

I ! (Him  — I J L 

In  particular,  the  case  f ( x ) = — ln(1  + x)  proves  (9.64)  for  all  k,n  > 0. 

54  Let  f ( x ) be  a positive,  differentiable  function  such  that  x f 1 ( x ) f ( x ) as 

x — > 00.  Prove  that 


y ± 

L-  \ 

k >n 


f(k) 

ft+Y- 


V n« 


if  a > 0. 


H int:  Consider  the  quantity  f(k  — |)/(k  —\)a  ~ f(k  T-  ^)/(k  + j)a- 

55  Improve  (9.99)  to  relative  error  0(n~322+5e). 

56  The  quantity  Q(n)  = 1 + +■  . . = £Ik>i  n-/Tlk  occurs  in  the 

analysis  of  many  algorithms.  Find  its  asymptotic  value,  with  absolute 
error  o(l). 

5’7  An  asymptotic  formula  for  Golomb’s  sum  ^k>1  1 /k[l  +logn  kj2  is  derived 
in  (9.54).  Find  an  asymptotic  formula  for  the  analogous  sum  without 
floor  brackets,  21k>i  1/k(  1 +logn  k)2.  H int:  We  have  ue~uk“tu  dit  = 
1/(1  + tin  k)2. 
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58  Prove  that 

Bm(M) 


m! 


'(2ti) 


' cos(27tkx 

k£l 


km 


for  m .>  2, 


by  using  residue  calculus,  integrating 


2m  e27tizfl  dz 

e27tiz__  1 zm. 


on  the  square  contour  z=  x + iy,  where  max(|x|,  |y|)=  M+j,  and  letting 
the  integer  M tend  to  oo. 

59  Let  0n(t)  = e dk+tr/n^  a periodic  function  of  t.  Show  that  the 
expansion  of  0n(t)  as  a Fourier  series  is 


0n(t)  = v/7m(1  + 2e  7t‘n(cos27tt)  + 2e  47rn(cos47rt) 

+ 2e  97l2n(cos67Tt) + •■•)• 


(This  formula  gives  a rapidly  convergent  series  for  the  sum  0n  = 0n  (0) 
in  equation  (9.93).) 

60  Explain  why  the  coefficients  in  the  asymptotic  expansion 

2n\  4n  / 1 1 5 21 

n J ’/mx  \ 8n  128n2  1024n3  32768n4 

all  have  denominators  that  are  powers  of  2. 

61  Exercise  45  proves  that  the  discrepancy  D(  oc,  n)  is  o(n)  for  all  irrational 
numbers  a.  Exhibit  an  irrational  a.  such  that  D [a,  n)  is  not  0 (n1  e ) for 
any  e > 0. 

6 2 Given  n,  let  { m'1n)}  = maxk  {£}  be  the  largest  entry  in  row  n of  Stirling's 
subset  triangle.  Show  that  for  all  sufficiently  large  n,  we  have  m(n)  = 
1_ttl(tl) J or  m(n)  = |"m(n)"|,  where 

m(n)(rh(n)  + 2)  ln( rft(n)  + 2)  = n(m(n)  + 1) 


+ 0(n 


Hint:  This  is  difficult. 


63  Prove  that  S.W.  Golomb’s  self-describing  sequence  of  exercise  2.36  sat- 

isfies f(n)  = cf)2 1 + 0(n<t’ /logn). 

64  Find  a proof  of  the  identity 


L 


COb  2n7rx 


= 7T2(X2 


x + 


for  0 ^ x ^ 1, 


that  uses  only  “Eulerian”  (eighteenth-century)  mathematics. 
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Research  problems 

65  Find  a “combinatorial’'  proof  of  Stirling’s  approximation.  (Note  that  rtn 

is  the  number  of  mappings  of  { 1 , 2 n}  into  itself,  and  n!  is  the  number 

of  mappings  of  {1  ,2, ....  n}  onto  itself.) 

66  Consider  an  n x n array  of  dots,  n 1>  3,  in  which  each  dot  has  four 
neighbors.  (At  the  edges  we  “wrap  around’’  modulo  n.)  Let  Xn  be  the 
number  of  ways  to  assign  the  colors  red,  white,  and  blue  to  these  dots  in 
such  a way  that  no  neighboring  dots  have  the  same  color.  (Thus  X3  = 12.) 
Prove  that 

Xn  ~ (f)W/2e  71/6  • 

67  Let  Qn  be  the  least  integer  m such  that  Hm  > n.  Find  the  smallest 
integer  n such  that  Qn  7^  |_en or  prove  that  no  such  n exist. 


Th-th-th-that’s  all, 
folks! 


A 


Answers  to  Exercises 


(The  first  tinder  of 
every  error  in  this 
book  will  receive 
a reward  of  $2.56.) 

Does  that  mean 
I have  to  find  every 
error? 

(We  meant  to  say 
“any  error.'4) 

Does  that  mean 
only  one  person  gets 
a reward? 

(Hmmm.  Try  it  and 

see.) 


The  number  of 
intersection  points 
turns  out  to  give 
the  who /e  story; 
convexity  was  a red 
herring. 


EVERY  EXERCISE  is  answered  here  (at  least  briefly),  and  some  of  these 
answers  go  beyond  what  was  asked.  Readers  will  learn  best  if  they  make  a 
serious  attempt  to  find  their  own  answers  before  peeking  at  this  appendix. 

The  authors  will  be  interested  to  learn  of  any  solutions  (or  partial 
solutions)  to  the  research  problems,  or  of  any  simpler  (or  more  correct)  ways 
to  solve  the  non-research  ones. 

1.1  The  proof  is  fine  except  when  n = 2.  If  all  sets  of  two  horses  have 
horses  of  the  same  ‘color,  the  statement  is  true  for  any  number  of  horses. 

1.2  If  Xn  is  the  number  of  moves,  we  have  Xo  = 0 and  Xn  = Xn  i + 1 + 
Xn  i + 1 + Xn_i  when  n > 0.  It  follows  (for  example  by  adding  1 to  both 
sides)  that  Xn  = 3n  — 1.  (After  jXn  moves,  it  turns  out  that  the  entire  tower 
will  be  on  the  middle  peg,  halfway  home!) 

1.3  There  are  3r'  possible  arrangements,  since  each  disk  can  be  on  any  of 
the  pegs.  We  must  ‘hit  them  all,  since  the  shortest  solution  takes  3”  — 1 moves. 
(This  construction  is  equivalent  to  a “ternary  Gray  code,”  which  runs  through 
all  numbers  from  (0.  . .0)3  to  (2.  .2)3,  changing  only  one  digit  at  a time.) 

1.4  No.  If  the  largest  disk  doesn’t  have  to  move,  2n  1 — 1 moves  will  suffice 

(by  induction);  otherwise  (2n_1  1)  + 1 + (2n_1  1)  will  suffice  (again  by 

induction). 

1 . 5 No;  different  circles  can  intersect  in  at  most  two  points,  so  the  fourth 
circle  can  increase  the  number  of  regions  to  at  most  14.  However,  it  is  possible 
to  do  the  job  with  ovals: 


483 


484  ANSWERS  TO  EXERCISES 

Venn  [294]  claimed  that  there  is  no  way  to  do  the  five-set  case  with  ellipses, 
but  a five-set  construction  with  ellipses  was  found  by  Griinbaum  [137], 

This  answer  as- 
sumes that  n > 0 . 

1.7  The  basis  is  unproved;  and  in  fact,  H(l)  ^ 2. 

1.8  Qi  = (1  + P)/a;  Q3  = (l  + a + |3)/a|3;  Q4  = (1  + a)/|3;  Q5  = a; 

Qg  = (3.  So  the  sequence  is  periodic! 

1.9  (a)  We  get  P(n  — 1)  from  the  inequality 

x,..x„_,(-__i-l)S  (l!±^)"_ 

(b)  x, . . .xnxu+1  . . . x2n  6 (((xi  + ■ ■ • + xu]/n)((xu+1  + ■ ■ • + X2U]/n.))n  by 

P(n);  the  product  inside  is  <j((xi  H f-X2n)/2n)2  by  P(2).  (c)  For  example, 

P(5)  follows  from  P(6)  from  P(3)  from  P(4)  from  P(2). 

1.10  First  show  that  Rn  = Rn_i  + 1 + Qn_i  + 1 + Rn-i,  when  n > 0. 

Incidentally,  the  methods  of  Chapter  7 will  tell  us  that  Qn  = (( I + \/3  )n+1  — 

(1  _ V3]n+1)/(2x/3)-l. 

1.11  (a)  We  cannot  do  better  than  to  move  a double  (n  — l)-tower,  then 
move  (and  invert  the  order  of)  the  two  largest  disks,  then  move  the  double 
(n  — l)-tower  again;  hence  A,  = 2An_i  + 2 and  A„  = 2Tn  = 2n+1  — 2.  This 
solution  interchanges  the  two  largest  disks  but  returns  the  other  2n  — 2 to 
their  original  order. 

(b)  Let  Bn  be  the  minimum  number  of  moves.  Then  B;  = 3,  and  it  can 
be  shown  that  no  strategy  does  better  than  Bn  = An_i  + 2 -f  An_i  + 2 + Bn~i 
when  n > 1.  Hence  B,v  = 2n+2-5,  for  all  n > 0.  Curiously  this  is  just  2An  — 1, 
and  we  also  have  Bn  = An_i  + 1 + An_i  + 1 + An_i  + 1 + An_i . 

1.12  If  all  mk  > 0,  then  A(mi  , . . . , m,)  = 2A(mi , . . . , mn.  i ) + ma.  This  is 
an  equation  of  the  “generalized  Josephus”  type,  with  solution  (mi , , . mn)2  = 

2n_1  mi  + . . • + 2mn_i  + mn. 

Incidentally,  the  corresponding  generalization  of  exercise  lib  appears 
to  satisfy  the  recurrence 

{A(mi, . , . , ttltv 3 , if  mn  = 1; 

2mn  - 1 , if  n = 1 ; 

2A(mi , .. . , mn_i)  + 2mn 

+ if  n > 1 and  mn  > 1. 


1.6  If  the  nth  line  intersects  the  previous  lines  in  k > 0 distinct  points,  we 
get  k-  1 new  bounded  regions  (assuming  that  none  of  the  previous  lines  were 
mutually  parallel)  and  two  new  infinite  regions.  Hence  the  maximum  number 
of  bounded  regions  is  (n— 2)  + (n— 3)  + - • . = Sn_2  = (n—  1 ) (n — 2)/2  = Ln— 2n. 


B(mi,...,mn)  = 
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1.13  Given  n straight  lines  that  define  Ln  regions,  we  can  replace  them 

by  extremely  narrow  zig-zags  with  segments  sufficiently  long  that  there  are 
nine  intersections  between  each  pair  of  zig-zags.  This  shows  that  ZZ,  = 
ZZ,  ’ + 9u  8,  for  all  n > 0;  consequently  ZZ,  = 9Sn  — 8n4-  1 = 1. 

1.14  The  number  of  new  3-dimensional  regions  defined  by  each  new  cut  is 
the  number  of  2-dimensional  regions  defined  in  the  new  plane  by  its  intersec- 
tions with  the  previous  planes.  Hence  Pn  = Pn  ’ + Ln  i,  and  it  turns  out 
that  P5  = 26.  (Six  cuts  in  a cubical  piece  of  cheese  can  make  27  cubelets,  or 
up  to  Pb  — 42  cuts  of  weirder  shapes.) 

Incidentally,  the  solution  to  this  recurrence  fits  into  a nice  pattern  if 
we  express  it  in  terms  of  binomial  coefficients  (see  Chapter  5): 


1 bet  I know  what 

happensin  four 

dimensions! 


Here  Xtl  is  the  maximum  number  of  1-dimensional  regions  definable  by  n 
points  on  a line. 

1.15  The  function  I satisfies  the  same  recurrence  as  J when  n > 1)  but  I(  1) 
is  undefined.  Since  1(2)  = 2 and  I(  3)  = 1,  there’s  no  value  of  I ( 1)  = a that 
will  allow  us  to  use  our  general  method;  the  “end  game”  of  unfolding  depends 
on  the  two  leading  bits  in  n’s  binary  representation. 

If  n — 2m  + 2m~1  + k,  where  0 ^ k < 2m+1  + 2m  — (2m  + 2m  ')  = 
2m  +2m  \ the  solution  is  I(n)  = 2k+  1 for  all  n > 2.  Another  way  to  express 
this,  in  terms  of  the  representation  n = 2m  + 1,  is  to  say  that 

= / J(n)  + 2m  ' ’ if  0 l < 2m  - 1 ; 

\ J (n)  — 2m,  if  2m  1 ^l<2m. 


1.16  Let  g(n)  = a(n)cx+  b(n)|3o  + c(n)|3i  + d(n)y.  We  know  from  <i.is) 

thata(n)a+  b(n)(30+  c(n)0i  = (a(3bm,  |3bm |3b||3bj3  when  n = 

(1  bm  ’ • • b’  bo ) 2 i this  defines  a(n),  b(n),  and  c(n).  Setting  g(n)  = n in 
the  recurrence  implies  that  a(n)  T c(n)  — d(n)  = n;  hence  we  know  every- 
thing. [Setting  g(n)  = 1 gives  the  additional  identity  a(n)—  2b(n)—  2c(n)  = 1, 
which  can  be  used  to  define  b(n)  in  terms  of  the  simpler  functions  a(n)  and 
a(n)  + c(n).] 

1.17  In  general  we  have  Wm  <;  2Wm  k + Tk)  for  0 <;  k -<  in.  (This  relation 
corresponds  to  transferring  the  top  n — k,  then  using  only  three  pegs  to 
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move  the  bottom  k,  then  finishing  with  the  top  n — k.)  The  stated  relation 
turns  out  to  be  based  on  the  unique  value  of  k that  minimizes  the  right- 
hand  side  of  this  general  inequality,  when  m = n(n  + 1 ) /2.  (However,  we 
cannot  conclude  that  equality  holds;  many  other  strategies  for  transferring 
the  tower  are  conceivable.)  If  we  set  Yn  = (Wn(n+1)/2  — 1)/2n,  we  find  that 
Yn  ^ Yn_i  + 1;  hence  Wn(n+1)/2^  2n(rt-  1)  + 1. 

1.18  It  suffices  to  show  that  both  of  the  lines  from  (n2’,0)  intersect  both  of 
the  lines  from  (n  2k  ,0),  and  that  all  these  intersection  points  are  distinct. 

A line  from  (xj,  0)  through  (xj  — dj,  1)  intersects  a line  from  (xk,  0) 
through  (xic-  ak,  1)  at  the  point  (xj  - tQj,t)  where  t = (xk  Xj)/(ak  aj). 

Let  Xj  = n2’  and  Qj  = rd  + (0  or  nrn).  Then  the  ratio  t = (u.2k  n2’)/ 

(nk  n’+(  — nrnor  0 or  rirn))  lies  strictly  between  ni+nk  — 1 and  rt’+nk+1; 
hence  the  y coordinate  of  the  intersection  point  uniquely  identifies  j and  k. 
Also  the  four  intersections  that  have  the  same  j and  k are  distinct. 

1.19  Not  when  n > 11.  A bent  line  whose  half-lines  run  at  angles  0 and 
0 + 30”  from  its  apex  can  intersect  four  times  with  another  whose  half-lines 
run  at  angles  tfi  and  cp  + 30”  only  if  |0  — 4>|  > 30”.  We  can’t  choose  more 
than  11  angles  this  far  apart  from  each  other.  (Is  it  possible  to  choose  11?) 

1.20  Let  h(n)  = a(n)a  + b(n)|3o  + c(n)Pi  + d(n)yo  + e(n)yi.  We  know 
from  (1.18)  that  a(n)a+b(n)|30+c(n)|3i  = (a |3bm_, Pbm_2 . . . |3b,  Pb0)4whe„ 
n = (1  bm_i  . . . bj  bo)2i  this  defines  a(n),  b(n),  and  c(n).  Setting  h(n)  = n in 
the  recurrence  implies  that  a(n)+c(n)-2d(n)-2e(n)  = n;  setting  h(n)  - n2 
implies  that  a(n)  + c(n)  + 4e(n)  = n2.  Hence  d(n)  = (3a(n)  + 3c(n)  — n2  — 
2n)/4;  e(n)  = (n2  - a(n)  c(n))/4. 

1.21  We  can  let  m be  the  least  (or  any)  common  multiple  of  2 n,  2n  — I , 

. , , , n + 1 . [A  non-rigorous  argument  suggests  that  a “random”  value  of  m 
will  succeed  with  probability 

n n - 1 I / Y2n\  y/nn 

2n  2n  — 1 " ' n 1 _ / \n  J ~ ’ 

so  we  might  expect  to  find  such  an  m less  than  4n.] 

1.22  Take  a regular  polygon  with  2n  sides  and  label  the  sides  with  the 
elements  of  a “de  Bruijn  cycle”  of  length  2n.  (This  is  a cyclic  sequence  of 
O’s  and  l’s  in  which  all  n-tuples  of  adjacent  elements  are  different;  see  [173, 
exercise  2.3.4.2-23]  and  [174,  exercise  3.2.2-17].)  Attach  a very  thin  convex 
extension  to  each  side  that’s  labeled  1.  The  n sets  are  copies  of  the  resulting 
polygon,  rotated  by  the  length  of  k sides  for  k = 0, 1 , . . . , n — 1. 

1.23  Yes.  (We  need  principles  of  elementary  number  theory  from  Chap- 
ter 4.)  Let  L(n)  = lcm(l,2, . , . , n).  We  can  assume  that  n > 2;  hence  by 


I once  rode  a 
de  Bruijn  cycle 
(when  visiting  at 
his  home  in  Nuenen, 
The  Netherlands). 
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Bertrand’s  postulate  there  is  a prime  p between  n/2  and  n.  We  can  also 
assume  that  j > n/2,  since  q 1 = L(n)  + 1 — q leaves  j ; = n + 1 j if  and 
only  if  q leaves  j.  Choose  q so  that  q = 1 (mod  L(n)/p)  and  q = j + 1 n 
(modp).  The  people  are  now  removed  in  order  1,  2, n p,  j +1,  j +2, 

, , n,n-p  + l,  . . . . j-1. 

1.24  The  only  known  examples  are:  Xn  = a/Xn.  i,  which  has  period  2; 
R.  C.  Lyness’s  recurrence  of  period  5 in  exercise  8;  H.  Todd’s  recurrence 
Xn  = (1  + Xn  i + Xn_2)/Xn-3,  which  has  period  8;  and  recurrences  derived 
from  these  by  substitutions  of  the  form  Yn  = CtXmn.  An  exhaustive  search 
by  Bill  Gosper  turned  up  no  nontrivial  solutions  of  period  4 when  k = 2. 

A partial  theory  has  been  developed  by  Lyness  [210]  and  by  Kurshan  and 
Gopinath  [189],  An  interesting  example  of  another  type,  with  period  9 when 
the  starting  values  are  real,  is  the  recurrence  Xn  = |Xn_i  — Xn_2  discovered 
by  Morton  Brown  [38].  Nonlinear  recurrences  having  any  desired  period  }>  5 
can  be  based  on  continuants  [55]. 

1.25  If  T|k|(rt)  denotes  the  minimum  number  of  moves  needed  to  transfer  n 
disks  with  k auxiliary  pegs  (hence  jl1  : (n)  = Tn  and  T(2'  (n)  = Wn),  we  have 
j(k) ( (n+i) -j  ^ 2T,k' ((£))  + T,k~1 '( (k/1) )•  No  examples  (n,  kj  are  known  where 
this  inequality  fails  to  be  an  equality.  When  k is  small  compared  with  n,  the 
formula  2n+1~k(2/!)  gives  a convenient  (but  non-optimum)  upper  bound  on 

T™0. 

1.26  The  execution-order  permutation  can  be  computed  in  0(n  log  n)  steps 
for  all  m and  n [175,  exercises  5. 1.1-2  and  5. 1.1-5].  Bjorn  Poonen  [241]  has 
proved  that  non-Josephus  sets  with  exactly  four  “bad  guys”  exist  whenever 
n = 0 (mod  3)  and  n ^ 9;  in  fact,  the  number  of  such  sets  is  at  least  e(]}) 
for  some  £ > 0.  He  also  found  by  extensive  computations  that  the  only  other 
n < 24  with  non-Josephus  sets  is  n = 20,  which  has  236  such  sets  with  k = 1 4 
and  two  with  k = 13.  (One  of  the  latter  is  {1 ,2, 3, 4, 5, 6,  7, 8, 1 1 , 14, 15, 16, 17}; 
the  other  is  its  reflection  with  respect  to  21.)  There  is  a unique  non- Josephus 
set  with  n = 15  and  k = 9,  namely  {3,4,5,6,8,10,11,12,13}. 

2.1  There’s  no  agreement  about  this;  three  answers  are  defensible:  (1)  We 
can  say  that  Ok  is  always  equivalent  to  J/m<k<n  9k;  then  the  stated 

sum  is  zero.  (2)  A person  might  say  that  the  given  sum  is  q4  + q3  + q2  + qi  +qoi 
by  summing  over  decreasing  values  of  k.  But  this  conflicts  with  the  generally 
accepted  convention  that  2Ik=1  qk  = 0 when  n = 0.  (3)  We  can  say  that 
Lk=m  4k  = Lk<:n  4k  - Lk<m  4k;  then  the  stated  sum  is  -q,  - q2  - q3.  This 
convention  may  appear  strange,  but  it  obeys  the  useful  law  ]Tk=Q  + ]Tk_b+|  = 
Y f for  all  a,  b,  c. 

i— k=a  ’ ’ 

It’s  best  to  use  the  notation  ^k=rn  only  when  n — m -1;  then  both 
conventions  (1)  and  (3)  agree. 
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2.2  This  is  jx|.  Incidentally,  the  quantity  ([x  >0]  — [x  < 0])  is  often  called 
sign(x)  or  signum(x);  it  is  +1  when  x > 0,  0 when  x 0,  and  -1  when  x < 0. 


2.3  The  first  sum  is,  of  course,  0.q  + ai  + a.2  + a.3  + a.4  T a.5;  the  second  is 

Q4+ai+ao+Qi+a4)  because  the  sum  is  over  the  values  k e {-2,-1,0,+l,+2}. 

The  commutative  law  doesn’t  hold  here  because  the  function  p(k)  = k2  is  not 
a permutation.  Some  values  of  n (e.g.,  n = 3)  have  no  k such  that  p(k)  = n; 
others  (e.g.,  n = 4)  have  two  such  k. 


La  v-4 

i=i+l  2_k=j  + l QUk 


2.4  (a)  Li=1  ^_j=i+1 

0-124 ) + O134)  + 0234- 

r-4  T-k  1 T-j-1 
( b ) 2_k==l  2_j=l  2_i=l  Qijk 

(Ul34  + 0-234 ) ) ■ 


L?=i  Ll 


)=i+ 


Lk=j  + 1 Qijk  = ((^123  + 


Hk=3  11^=2  ZUl  aijk  = 0123  + ( 0 1 24  + 


2.5  The  same  index  "k’  is  being  used  for  two  different  index  variables,  al- 
though k is  bound  in  the  inner  sum.  This  is  a famous  mistake  in  mathematics 
(and  computer  programming).  The  result  turns  out  to  be  correct  if  a,  = Qk 
foralljandk,  1 $ i,k  ^ n. 


2.6  It’s  [1  <C  n]  (n  ~ j + 1).  The  first  factor  is  necessary  here  because  we 
should  get  zero  when  j < 1 or  j > n. 

2 . 7 mxm  ~'.A  version  of  finite  calculus  based  on  V instead  of  A would 
therefore  give  special  prominence  to  rising  factorial  powers. 


2.8  0,  if  m 1;  l/|m|!,  if  m ^ 0. 

2.9  xm+n  = x,n  (x  + m)“,  for  integers  nr  and  n.  Setting  nr  = -n  tells  us 
that  x~n  = l/(x  — n)n  = 1/(x  — 1 . 


2.10  Another  possible  right-hand  side  is  Eu  Av  + v Au. 

2.11  Break  the  left-hand  side  into  two  sums,  and  change  k to  k + 1 in  the 
second  of  these. 

2.12  If  p(k)  = n then  n + c = k+  ((  — 1)k  + l)c  and  ((  — 1)k  + 1)  is  even; 
hence  ( — l)n+c  = (— 1)k  and  k = n-  ( — I)n+Cc.  Conversely,  this  value  of  k 
yields  p(k)  = n. 

2.13  Let  Ro  = a,  and  Rn  = Rn-i  + (— l)n(|3  + ny  +rt26)  for  n > 0.  Then 
R(n)  = A(n)a+  B(n)|3  + C(n)y  + D(n)6.  Setting  Rn  = 1 yields  A(n)  = 1. 
Setting  Rn  = (— 1)n  yields  A(n)  + 2B(n)  = (—1  )n.  Setting  Rn  = ( — 1)nn 
yields  — B(n) +2C(n)  = (— 1 )nn.  Setting  Rn  = (—1  )nn2  yields  B(n)  — 2C(n)  + 
2D(n)  = ( — 1 )nu2.  Therefore  2D(tl)=  ( — 1 )n(n2+n);  the  stated  sum  is  D(n). 

2.14  The  suggested  rewrite  is  legitimate  since  we  have  k = L 1<cj<:k  1 when 
1 ^k  ^ n.  Sum  first  on  k;  the  multiple  sum  reduces  to 

(2n+1  - 2’)  = n2n+1  - (2n+1  2)  ■ 

l$i$n 
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“It  is  a profoundly 
erroneous  truism, 
repeated  by  all 
copybooks  and  by 
eminent  people 
when  they  are 
making  speeches, 
that  we  should 
cultivate  the  habit 
of  thinking  of  what 
we  are  doing.  The 
precise  opposite  is 
the  case.  Civiliza- 
tion advances  by 
extending  the  num- 
ber of  important 
operations  which 
we  can  perform 
without  thinking 
about  them.  Opera- 
tions of  thought  are 
like  cavalry  charges 
in  a battle-they 
are  strictly  limited 
in  number,  they 
require  fresh  horses, 
and  must  only  be 
made  at  decisive 
moments.  ” 

-A.  N.  White- 
head  [302] 


2.15  The  first  step  replaces  k(k  + 1)  by  2 Xj<j<k  L The  second  step  gives 

®n  + □ n = (Hk=ll<-)  +Otv 

2.16  x— (x  — m)2-  □ = x^  = x— (x  n)— , by  (2.52). 

2.17  Use  induction,  for  the  first  two  =’s,  and  (2.52)  for  the  third.  The  second 

line  follows  from  the  first. 

2.18  Use  the  facts  that  (93z)+  jC  z|,  (9tz)_  ^ jz|,  (3z)+  ^ |z|,  (3z)_  <1  |z|, 
and  \z\  <C  (9Tz)+  + (9tz)-  + (3z)+  + (3z)“. 

2.19  Multiply  both  sides  by  2n~Vn!  and  let  Sn  = 2nTn/n!=  Sn-  ] +3-2n-'  = 

3(2n  1)  + So-  The  solution  is  Tn  = 3 • n!  + n!/2n_1 . (We’ll  see  in  Chapter  4 

that  Tn  is  an  integer  only  when  n is  0 or  a power  of  2.) 

2.20  The  perturbation  method  gives 


Sn  + (n  + 1 )Hn+i  Sn  + f Y_  Hk  j + n + 1 . 

r;;  k0^k:Cn  ' 

2.21  Extracting  the  final  term  of  Sn+i  gives  Sn+i  = 1 — Sn;  extracting  the 
first  term  gives 


' n+l 


t-i)n+1+  Y.  (-i)n+1_k 

l^ksjn+l 


n+l  \ { i vn-k 

0$k$n 

n+1 


I 

0$k 

(-Dn+I  + Sn. 

Hence  2Sn  = 1 + (-‘I  )n  and  we  have  Sn  = fn  is  even].  Similarly,  we  find 

n 

Tn+1  = n+l -Tn  = ^(-l)n-k(k+1)  = Tn  + Sn, 


k=0 


hence  2Tn  = n 1 — Sn  and  we  have  Tn  = \ (n  -f  [n  is  odd]).  Finally,  the 
same  approach  yields 


Un+i  - (n  -f  1 )2  — Un  - Uu  + 2Tn  + Sn 

= Un  + n + [n  is  odd]  + [n  is  even] 
= Un  + n + 1 . 


Hence  Un  is  the  triangular  number  j (n  + 1 )n. 

2.22  Twice  the  sum  gives  a “vanilla”  sum  over  1 <;j,  k <C  n,  which  splits 
into  three  sums  that  can  be  handled  easily. 

2.23  (a)  This  approach  gives  four  sums  that  evaluate  to  2n  T Hn  — 2n 

(Hu  + — 1).  (It  would  have  been  easier  to  replace  the  summand  by 

1/k  + 1/(k  + 1 ).)  (b)  Let  u(x)  = 2x  + 1 and  Av(x)  - l/x(x  + 1 ) = (x  — 1 )— ; 
then  Au(x)  = 2 and  v(x)  = — (x  — 1)—  ;=  -1/x.  The  answer  is  2Hn  - tt?'. 
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2.24  Summing  by  parts,  £ x— Hx  5x  = x^±iHx/(m+1 )—  x^ii/(m+1  )2  + C; 
hence  Lo&i<n  k~Hk  = n^l±l(Hn  - l/(m  + 1 ))/(m  + 1 ) + 0™±i/(m  + 1 )2.  In 
our  case  m = -2,  so  the  sum  comes  to  1 — (H,  + 1)/(n  + l). 

2.25  Here  are  some  of  the  basic  analogies: 


XcQk  = Qk  f — > 

k<EK  keK 

^_(Qk+bk)  = ^Qk+  ^bkH 

keK  k£K  kGK 

Y_  dk  = Y Qp(k]  < > 

keK  p(k)6K 


u<  = (ek) 

k€K  kk€K  7 

n Qkbk  = (riQk)(nbk) 

keK  kkeK  7 Xk€K  7 

rr  Qk  — i~j  ap(ki 

keK  p(k)eK 


Y afk 

jel 

iSJ  keK 

k£K 

ii 

d 

Y_  Qk[k € K 

keK 

k 

II 

Usll 

#K 

n Qfk 

= nn 

jej 

k£K 

j€J  keK 

riQk  = 

rKeKI 

keK 

k 

11'  = 

keK 

C#K 

2.26  P2  = (Eli $ j ,k$n  QiQk)  (Hi ^j=k<:n  ak) • The  first  factor  is  (nk=,  <)2; 
the  second  factor  is  rik=iQk'Hence  p = oik=i  ak)n+1. 

2.27  A(c-)  = C— (c  — x — 1)  = c^^-/(c  — x).  Setting  c = -2  and  decreasing 
x by  2 yields  A(  — (—  2)— -)  = ( — 2)—/ x,  hence  the  stated  sum  is  (—2)—  — 

2)2-2.  = (— 1 }nn!  - 1. 


2.28  The  interchange  of  summation  between  the  second  and  third  lines  is 
not  justifiable;  the  terms  of  this  sum  do  not  converge  absolutely.  Everything 

else  is  perfectly  correct,  except  that  the  result  of  [k  = j —1]  k/j  should  As  opposed 

perhaps  have  been  written  [j  — 1 1](j  — l)/j  and  simplified  explicitly.  i mperf  set  I y 

2.29  Use  partial  fractions  to  get 


1 


4k2  i 


1 


+ 


i 


4 V 2k  + 1 2k-  1 


The  ( — 1)k  factor  now  makes  the  two  halves  of  each  term  cancel  with  their 
neighbors.  Hence  the  answer  is  —1/4+  ( — 1 )n/(8n  + 4). 

2.30  x dx  = l(b-  — q-)  = ^(b  — a)(b  + a — 1 ).  Sowehave 
(b-a)(b  + a-l)  = 2 1 00  = 22-3-52-7, 


to 

correct. 
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There  is  one  solution  for  each  way  to  write  2100  = x - y where  x is  even  and 
y is  odd;  we  let  a =:  j ]x  — y|  + j and  b = j(x  + y)  + ~ . So  the  number  of 
solutions  is  the  number  of  divisors  of  3' 5‘  -7,  namely  12.  In  general,  there 
are  rw^p + ways  to  represent  J"[  pUp,  where  the  products  range  over 
primes. 

2-31  i k =Lj^21/i2(1  -l/j)  = 1 /i  (i  — 1 )•  Thesecondsumis, 

similarly,  3/4. 


2.32  If  2n  ^ 

x < 2n+1,  the  sums 

are  0 + ■ • ■ 

■fn+ 

(x- 

-n-  !)  + ••• 

+ (x- 

■2n)  = 

n(x-n)  = (x-1) 

+ (x-3)  + ■ ■ ■ + 

(X- 

2n+l).  If 

2n  — 

1 

^ x < 2n 

they 

are, 

similarly,  both 

equal  to  n(x  - 

n). 

(Looking  ahead 

to 

Chapter  3, 

the  formula 

[]j(x+  1)J  (x 

[j(x  + 1)  ) covers  both  cases.) 

2.33  If  K is  i 

empty,  AkeK  ak  = 

00. 

The  basic 

analogies  are: 

^cak 

= c Y_  ak 

A 

Ic  - 

F Qk)  — C + Qk 

k£K 

kgK 

keK 

keK 

5>k+bk)  = ^ak  + 

L 

bk  < — i 

A min(ak,bk) 

keK 

k€K 

keK 

keK 

\ 

= 

min(  A Qk,  A 

bk) 

x,keK 

keK 

/ 

M 

P 

7T 

“ 2_  aP(kl 

A 

ak 

- A flp|ki 

keK 

p(k)£K 

keK 

p(k]€K 

Y_  af 

< = IIQU 

(— 1 

A a’'k 

= A A 

Q),k 

iej 

iei  keK 

J 

ieJkeK 

keK 

keK 

Y-  Qk 

= Y_  QkfkG  K] 

" / 

\ Qk 

= AQk'°°,keK 

! 

k£K 

k 

k€K 

k 

A permutation  that 
consumes  terms  of 
one  sign  faster  than 
those  of  the  other 
can  steer  the  sum 
toward  any  value 
that  it  likes. 


2.34  Let  K+  = {k  Ok  ^ 0}  and  K = {k  ctk  < 0}.  Then  if,  for  example,  n 
is  odd,  we  choose  Fn  to  be  Fn  _j  U En,  where  En  C K~  is  sufficiently  large  that 

Lk6(F„  ,nK’  I ak  - LkeE„  ( ak ) < A 

2.35  Goldbach’s  sum  can  be  shown  to  equal 


L "> 

m,n^2 


L 

m^>2 


i 

m(m 


1) 


= 1 


as  follows:  By  unsumming  a geometric  series,  it  equals  ^kgP  l>]  k 1 ; there- 
fore the  proof  will  be  complete  if  we  can  find  a one-to-one  correspondence 
between  ordered  pairs  (m,  n)  with  ra,n  ^ 2 and  ordered  pairs  (k,  l)  with 
k 6 P and  1^1,  where  mn  = kl  when  the  pairs  correspond.  If  in  </  P we  let 
(m,n)  h (ran,  1);  but  if  m = ah  G P,  we  let  (m,n)  a (an,b). 
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2.36  (a)  By  definition,  g (ti)  g(n  1)  = f(n).  (b)  By  part  (a),  g(g(n))  — 

g(g(n  — i ))  = Lkf(k)[g(n“  1)<k^g(n)]  = n(g(n) -g(n-  i))  =nf(n). 
(c)  by  part  4 )again,  g(g(g(n)))  - g(g(g(n-  1)))  is 

^f(k)[g(g(n- 1))<k^g(g(n))] 

k 


With  this  self- 
description, 
Golomb’s  se- 
quence wouldn’t 

do  too  well  on  the 
Dating  Game. 


= Hj  [j=f(k)][g(g(n-1))<k<:g(g(n))] 

j,k 

= H^  [i  = f (k)]  [g(rr  — 1 ) < j ^ g(rt)] 
j,k 

= Hi  (9(ii  — 9()  — 1))[g(n  — 1)<j  ^g(n)] 


= [9(n_1)<i^g(n)]  - nHi  [g(n~i)<Kg(n)] . 

i i 


Colin  Mallows  observes  that  the  sequence  can  also  be  defined  by  the  recurrence 


f ( 1 ) = 1;  f(n+  1)  = 1 + f(n+  1 — f (f (n) ))  , for  n {>  0. 

2.37  (RLG  thinks  they  probably  won’t  fit;  DEK  thinks  they  probably  will; 
OP  is  not  committing  himself.) 

3.1  m=  |_lg nj ; l = n - 2m  = rt  - 2LlgnJ . 

3.2  (a)  [X  + ,5J.  (b)  |"x  - .5]. 

3.3  This  is  [ran  {ma}n/aj  = mn  1 , since  0 < {met}  < 1. 

3.4  Something  where  no  proof  is  required,  only  a lucky  guess  (I  guess) 

3.5  We  have  [axj  = n[xj  <=>  n[xj  <C  [nxj  < n [_xj  1 n [xj  ^ 

nx  < n [_xj  + 1 4=^  nx  n{x}  ^ nx  < nx  n{x}  + 1,  by  (3.5(a)),  (3.7(a)), 
(3.7(d)),  and  (3.8);  and  this  is  equivalent  to  u{x}  < 1,  when  n is  a positive 
integer.  (Notice  that  n[xj  <{  [nxj  for  all  x in  this  case.) 

3.6  [f(x)J  = Lf(M)J- 

3.7  [n/mj  + n mod  m. 

3.8  If  all  boxes  contain  < {n/m]  objects,  then  n <C  ({n/m)  — 1)ttl,  so 
n/m  -f  1 ^ [u/m],  contradicting  (3.5).  The  other  proof  is  similar. 

3.9  We  have  m/n-l/q  = (n  mumble  m)/qn.  The  process  must  terminate, 
because  0 <C  n mumble  m < m.  The  denominators  of  the  representation  are 
strictly  increasing,  hence  distinct,  because  qn/(n  mumble  m)  > q. 

3.10  [x  + — [(2x  + l)/4  is  not  an  integer]  is  the  nearest  integer  to  x,  if 

{x}  ^ 2;  otherwise  it’s  the  nearest  even  integer.  (See  exercise  2.)  Thus  the 
formula  gives  an  “unbiased”  way  to  round. 
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3.11  If  n is  an  integer,  a < n < 3 «=>  |aJ  < n < |" (3],  The  number  of 
integers  satisfying  a < n < b when  a and  b are  integers  is  (b  a 1)  (b  > a). 
We  would  therefore  get  the  wrong  answer  if  a = |3  = integer. 

3.12  Subtract  [n/mj  from  both  sides,  by  (3.6),  getting  |"(n  mod  m)/m]  = 
L(n  mod  m + m — 1 ) / ttlJ  . Both  sides  are  now  equal  to  [n  mod  m >0],  since 

0 n mod  m < m. 

A shorter  but  less  direct  proof  simply  observes  that  the  first  term  in 
(3.24)  must  equal  the  last  term  in  (3.25). 

3.13  If  they  form  a partition,  the  text’s  formula  for  N(a,  n)  implies  that 

1 /a  + 1/(3=  1,  because  the  coefficients  of  n in  the  equation  N(a,  n)  + 
N ((3,  n)  = n must  agree  if  the  equation  is  to  hold  for  large  n.  Hence  a 
and  (3  are  both  rational  or  both  irrational.  If  both  are  irrational,  we  do  get  a 
partition,  as  shown  in  the  text.  If  both  can  be  written  with  numerator  m,  the 
value  m-  1 occurs  in  neither  spectrum.  (However,  Golomb  [121]  has  observed 
that  the  sets  { [naj  n 1}  and  { pn(3]  - 1 n ;>  1}  always  do  form  a partition, 
when  1/a  4-  1/(3  = 1.) 

3.14  It’s  obvious  if  ny  = 0,  otherwise  true  by  (3.21)  and  (3.6). 

3.15  Plug  in  |"mx]  for  n in  (3.24):  [mx]  = [Y|  + [x  — ml  + ' ' ' + [x  “ 

3.16  The  formula  n mod  3 = l+l(cu  — 1 )u>n  - (u)  +2)tu2n)  can  be  verified 
by  checking  it  when  0 <C  n < 3. 

A general  formula  for  n mod  m,  when  m is  any  positive  integer,  ap- 
pears in  exercise  7.25. 

3.17  Lj.JO^kcmHI^Kx  + k/m]  = k<  m][l  j ^ [Y|]  x 

[k^m(j  -x)]  = Z.1^rx/Xk[°^k<m]  Lj=M  Zk[°^k<m(i  “x)j  = 
m|Y|  - [m(  |Y|  - x)]  = -[-mx]  = LmxJ- 

3.18  We  have 

S = y~  [jcr1  ^k<  (j  +v)q  ']  . 

O^jcfna]  k-^n 

if  j ^ Tl(X  1 no:  — v,  there  is  no  contribution,  because  (j  + v)a  ' $ n. 

Hence  j = Ln(XJ  is  the  only  case  that  matters,  and  the  value  in  that  case 
equals  [(  |naj  + vjar1]  — n ^ [va 

3.19  If  and  only  if  b is  an  integer.  (If  b is  an  integer,  log,  x is  a continuous, 
increasing  function  that  takes  integer  values  only  at  integer  points.  If  b is  not 
an  integer,  the  condition  fails  when  x = b.) 

3.20  We  have  ^k  kx[a^kx  ^ |3]  = x ]^kk[  [a/x]  ^ ksC  [|3/xJ],  which  sums 
to  5x(L|3/xJL(3/x+  1J  - [a/xlfa/x-l]). 
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3.21  If  10n  ^ 2m  < 10n+1 , there  are  exactly  n+  1 such  powers  of  2,  because 
there’s  exactly  one  n-digit  power  of  2 for  each  n.  Therefore  the  answer  is 
l + [Mlog2J. 

Note:  The  number  of  powers  of  2 with  leading  digit  l is  more  difficult, 
when  l > 1;  it’s  £0$n^M  ([nlog2  - log  IJ  - |ulog2  - log(l  + 1 )J). 

3.22  All  terms  are  the  same  for  n and  n-1  except  the  kth,  where  n = 2k  1 q 
and  q is  odd;  we  have  Sn  = Sn_i  + 1 and  Tn  = Tn_1  + 2kq.  Hence  Sn  = n 
and  Tn  =n(n+  1). 


3.23  Xn  = m )m(m  - 1 ) < n $ jm(m  + 1 ) 4=^  m2  - m + | < 
2n  < m2  + m + \ m - j < \fl n < m + \ . 

3.24  Let  (3  = a/(a+  1).  Then  the  number  of  times  the  nonnegative  integer 
m occurs  in  Spec(  |3)  is  exactly  one  more  than  the  number  of  times  it  occurs 
in  Spec(a).  Why?  Because  N(|3,  n)  = N(ct,  n)  + n + 1. 


3.25  Continuing  the  development  in  the  text,  if  we  could  find  a value  of  m 
such  that  Km  ^ m,  we  could  violate  the  stated  inequality  at  n + 1 when 
n = 2m  + 1.  (Also  when  n = 3m  + 1 and  n = 3m  + 2.)  But  the  existence  of 
such  an  m = n’  +1  requires  that  2K|^n</2j  ^ n’  or  3K|_n./3j  <d  n’,  i.e.,  that 

K[n'/2J  ^ IV/2J  or  K|u'/3J  ^ Ln73J  , 

Aha.  This  goes  down  further  and  further,  implying  that  Ko  <C  0;  but  Ko  = 1 ■ 
What  we  really  want  to  prove  is  that  Kn  is  strictly  greater  than  n,  for 
all  n > 0.  In  fact,  it’s  easy  to  prove  this  by  induction,  although  it’s  a stronger 
result  than  the  one  we  couldn’t  prove! 

(This  exercise  teaches  an  important  lesson.  It’s  more  an  exercise  about 
the  nature  of  induction  than  about  properties  of  the  floor  function.) 

3.26  Induction,  using  the  stronger  hypothesis 

DLq)  ^ (q-D  ((^4r)  + -1),  forn^O. 

3.27  If  Dn  * = 2mb  — Q,  where  b is  odd  and  a is  0 or  1 , then  D^b  = 3mb  — a. 

3.28  The  key  observation  is  that  a„  = m2  implies  an+2k+i  = (m+k)2+m— k 
and  an+2k+2  = (m  + k)2  + 2m,  for  0 ^ k ^ m;  hence  an+2m+i  = (2m)2.  The 
solution  can  be  written  in  a nice  form  discovered  by  Carl  Witty: 


In  trying  to  devise 
a proof  by  mathe- 
matical induction, 
you  may  tail  for 
two  opposite  rea- 
sons. You  may  fail 
because  you  try  to 
prove  too  much; 

Your  P(n)  is  too 
heavy  a burden. 

Yet  you  may  also 
tail  because  you  try 

to  prove  too  little: 
Your  P (n)  is  too 
weak  a support. 

In  general,  you 
have  to  balance 
the  statement  of 
your  theorem  so 
that  the  support  is 
just  enough  for  the 
burden. 

- G.  Pdiya  [238] 


Qtv-1  - 2l  + 


when  2l  + l ^ n < 2l+1  + l + 1 . 


3.29  D(a',  [ccnj ) is  at  most  the  maximum  of  the  right-hand  side  of 
s(a',  lna|,V)  = -s(a,n,'v)  + S - e - [0  or  1]  - V + [0  or  11. 
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This  logic  is  seri- 
ously floored. 


3.3  0 Xn  = a2"  + a 2 ' , by  induction;  and  Xn  is  an  integer. 

3.31  Here’s  an  “elegant,”  “impressive”  proof  that  gives  no  clue  about  how 
it  was  discovered: 

LxJ  + LuJ  + I*  + yJ  = Lx  + LyJj  + Lx  + yJ 

e [x+  ^ L2yJ J + Lx  + iL^yJ  + iJ 
= [2x  + L2yjj  = L2xJ  + |2yJ  • 

But  there’s  also  a simple,  graphical  proof  based  on  the  observation  that  we 
need  to  consider  only  the  case  0 ^x,y  < 1.  Then  the  functions  look  like  this 
in  the  plane: 


A slightly  stronger  result  is  possible,  namely 

M + LyJ  + |x  + yJ  6 L2xJ  + L2yJ ; 

but  this  is  stronger  only  when  { x } = 2.  If  we  replace  (x,  y)  by  (-x,x  + y ) in 
this  identity  and  apply  the  reflective  law  (3.4),  we  get 

LyJ  + L*  + yJ  + L2xJ  6 L*J  + L2x  + 2yJ  • 

3.32  Let  f(x)  be  the  sum  in  question.  Since  f(x)  = f(-x),  we  may  assume 
that  x ^ 0.  The  terms  are  bounded  by  2k  as  k — » — oo  and  by  x2/2k  as 
k — » +oo,  so  the  sum  exists  for  all  real  x. 

We  have  f(2x)  = 2^k2k~1  ||x/2k-1  ||2  = 2f(x).  Let  f(x)  — l(x)  + r(x) 
where  l(x)  is  the  sum  for  k <C  0 and  r(x)  is  the  sum  for  k > 0.  Then  l(x+  1)  = 
l(x),  and  l(x)  ^ 1 /2  for  all  x.  When  0 x < 1 , we  have  r(x)  = x2/2  + x2/4  + 

• • • = x2,  and  r(x  + 1 ) = (x  - 1 )2/2  + (x  + 1 )2/4  + (x  + 1 )2/8  + • ■ • = x2  + 1 . 
Hence  f (x  + 1 ) = f (x)  + 1 , when  0 ^ x < 1. 

We  can  now  prove  by  induction  that  f(x+n)  = f ( x ) +n  for  all  integers 
n 0,  when  0 ^ x < 1.  In  particular,  f(n)  = n.  Therefore  in  general, 

fix)  = 2“mf(2mx)  = 2~m|2mxJ  + 2_mf({2mx}).  But  f({2mx})  = l({2mx})  + 
r({2mx})  ^ j + l;  solf(x)  -x|^|2-mL2mxJ-x|  + 2-m-f^2-m-f  for  all  m. 
The  inescapable  conclusion  is  that  f I x)  = |x  for  all  real  x. 

3.33  Let  r = n— ^ be  the  radius  of  the  circle,  (a)  There  are  2n.—  1 horizontal 
lines  and  2n—  1 vertical  lines  between  cells  of  the  board,  and  the  circle  crosses 
each  of  these  lines  twice.  Since  r2  is  not  an  integer,  the  Pythagorean  theorem 
tells  us  that  the  circle  doesn’t  pass  through  the  corner  of  any  cell.  Hence 


1 

2 

0 

1 
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the  circle  passes  through  as  many  cells  as  there  are  crossing  points,  namely 
8n  4 = 8r.  (The  same  formula  gives  the  number  of  cells  at  the  edge  of  the 
board.)  (b)  f(n,  k)  = 4|_\/r2  — k2J. 

It  follows  from  (a)  and  (b)  that 

{nr2 -2 r ^ Y_  L V^r2  — k2j  ^ {tit2,  r = n-{. 

0<k<r 

The  task  of  obtaining  more  precise  estimates  of  this  sum  is  a famous  problem 
in  number  theory,  investigated  by  Gauss  and  many  others;  see  Dickson  [65, 
volume  2,  chapter  6]. 

3.34  (a)  Let  rrt  = |Tgn]  . We  can  add  2”’  — n terms  to  simplify  the  calcula- 

tions at  the  boundary: 


f(n)  + (2m-n)m=  ^ Tig ^1  = £ j[j  = rigklHI  s$k^2m] 

k=l  j.k 

m 

= £j2M  = 2Tn(m  - 1 ) + 1 . 
j=i 

Consequently  f(n)  = nrn  — 2”  + 1. 

(b)  We  have  pn/2]  = [(u+1)/2J,  and  it  follows  that  the  solution  to  the 
general  recurrence  g(n)  = a(n)  + g(  prt/2])  T g(  [n/2J)  must  satisfy  Ag(n)  = 
Aa(n)+Ag(|n/2J).  Inparticular,  when  a(n)  =n-l,  Af(n)  = 1+Af(|n/2J) 
is  satisfied  by  the  number  of  bits  in  the  binary  representation  of  n,  namely 
[lg(n  + 1 )].  Now  convert  from  A to  L. 

A more  direct  solution  can  be  based  on  the  identities  fig  2j]  = [lg  j]  + 1 

and  [lg(2j  1)]=  flgll  + [)>1],  for  j ^ L 

3.35  (n  + 1)2n!e  = An  + (n  + I)2  + (n  + 1 ) -I- Bn,  where 


, (n  + 1 )2n!  (n  + 1 )2n!  (n  + 1 )2n!  . 

An  “ 0! h if +"'+  ( n _ ! j ; is  a multiple  of  n 

4 D (n  + 1 )2rt!  , (n  + 1)2n! 

and  Bn  = — + -7 , -,u  + " 1 

n+2)!  n + 3! 


n + 2 \ + n + 3 + (n  + 3)(n  + 4)  + 

nil  ( i + _ L_  + ! + 

n + 2 \ + n + 3 (n  + 3)(n  + 3)  T 


is  less  than  1.  Hence  the  answer  is  2 mod  n. 
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3.36  The  sum  is 


^2  l4~m[m  = [lglj]  [l=  Llg k J ] [ 1 <k<22n] 

k.l.rn 

= Y 2~l4  m[2m  ^l<2m+1][2l  ^k<2l+,][0^Tn<n] 

k, l,m 

= 4_m[2m  ^ l<2m+1  ][0  ^ m<  nl 

l, m 

= ^2~m[0;Cm<n]  = 2(1  -2”n). 
m 


3.37  First  consider  the  case  m < n,  which  breaks  into  subcases  based  on 
whether  m < in;  then  show  that  both  sides  change  in  the  same  way  when 
m is  increased  by  n. 


This  is  really  only  a 
level  4 problem,  in 
spite  of  the  way  it's 
stated. 


3.38  At  most  one  can  be  noninteger.  Discard  all  integer  Xk,  and  suppose 
that  n are  left.  When  { x } ^ 0,  the  average  of  )mx)  as  m — > oo  lies  between  i 
and  I;  hence  {mxi}  + . . + {mx,}  — {rrtXi  + . . . + mx,}  cannot  have  average 
value  zero  when  n > 1. 

But  the  argument  just  given  relies  on  a difficult  theorem  about  uniform 
distribution.  An  elementary  proof  is  possible,  sketched  here  for  n = 2:  Let 
Pm  be  the  point  ({mx},  {my}) . Divide  the  unit  square  0 $ x,y  < 1 into 
triangular  regions  A and  B according  asx  + y<lorx  + y}>l.  We  want  to 
show  that  Pm  € B for  some  m,  if  { x } and  { y } are  nonzero.  If  Pi  € B,  we’re 
done.  Otherwise  there  is  a disk  D of  radius  e > 0 centered  at  Pj  such  that 
D C A.  By  Dirichlet’s  box  principle,  the  sequence  Pi , . . , Pm  must  contain 
two  points  with  |Pk  — Pj!  < e and  k > j,  if  N is  large  enough. 


It  follows  that  Pk_j  i is  within  e of  (1,1)  — Pi;  hence  Pk  j j G B. 

3.39  Replace  j by  b - j and  add  the  term  j = 0 to  the  sum,  so  that  exercise 
15  can  be  used  for  th.e  sum  on  j.  The  result, 

|x/bk]  [x/bk+1]  + b - l , 


telescopes  when  summed  on  k 
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3.40  Let  [2y^\  = 4k  + r where  -2  r < 2,  and  let  m = |_\A\I-  Then  the 
following  relationships  can  be  proved  by  induction: 


segment 

r 

m 

X 

Y 

if  and  only  if 

Wk 

- 2 

2k- 1 

m ( m + 1 ) - n - k 

k 

(2k  1 )(2k— 1 ) ^ rt  ^ (2k — 1 ) (2k) 

sk 

■ 1 

2k— 1 

- k 

m ( m + 1 ) - n + k 

(2k— 1 ) (2k)  < n < (2k)(2k) 

Ek 

0 

2k 

n - m ( m + 1 ) + k 

-k 

(2k)  (2k)  ^ n ^ (2k)  (2k+l) 

Nk 

1 

2k 

k 

n - m ( m + 1 ) - k 

(2k)(2k+1)  < n < (2k+l  )(2k+l ) 

Thus,  when  k l,  Wk  is  a segment  of  length  2k  where  the  path  travels  west 

and  y(n)  = k;  Sk  is  a segment  of  length  2k  — 2 where  the  path  travels  south 

and  x(n)  = — k;  etc.  (a)  The  desired  formula  is  therefore 

y(n)  - (-1  )m^(n  - m(m+ 1 ))  • [L2v/nJ  is  odd]  r^m])  . 

(b)  On  all  segments,  k = max(|x(n)|,  |y(n)|).  On  segments  Wk  and  Sk  we 
have  x < y and  n +x  + y = m(m  + 1)  = (2k)2  — 2k;  on  segments  Ek  and  Nk 
we  have  x )>  y and  n — x — y = m(m  + 1)  = (2k)2  + 2k.  Hence  the  sign  is 

(_1  j ( x( n |<y  | n)  ] 

3.41  Since  1/<E  T 1/<$>2  = 1’  the  stated  sequences  do  partition  the  positive 

integers.  Since  the  condition  g(n)  = f(f(n))  + 1 determines  f and  g uniquely, 

we  need  only  show  that  [[ji4)J  c|>J  + 1 = 1_tl4>2J  f°r  all  n >0.  This  follows 

from  exercise  3,  with  a = (J)  and  n = 1 . 

3.42  No;  an  argument  like  the  analysis  of  the  two-spectrum  case  in  the  text 
and  in  exercise  13  shows  that  a tripartition  occurs  if  and  only  ifl  /a  + 1/|3  + 

1 fy  = 1 and 


+ 


n + n 
P J 


+ 


1 , 


for  all  n > 0.  But  the  average  value  of  {(n+  l)/a}  is  1/2  if  a is  irrational,  by 
the  theorem  on  uniform  distribution.  The  parameters  can’t  all  be  rational, 
and  if  y = m/n  the  average  is  3/2  — l/(2n).  Hence  y must  be  an  integer,  but 
this  doesn’t  work  either.  (There’s  also  a proof  of  impossibility  that  uses  only 
simple  principles,  without  the  theorem  on  uniform  distribution;  see  [125].) 


3.43  One  step  of  unfolding  the  recurrence  for  Kn  gives  the  minimum  of  the 
four  numbers  1 + a+  Q • b • K|jn_i_a)/(a.t,|j , where  a and  b are  each  2 or  3. 
(This  simplification  involves  an  application  of  (3.11)  to  remove  floors  within 
floors,  together  with  the  identity  x + min(y,  z)  = min(x  + y,  x + z).  We  must 
omit  terms  with  negative  subscripts;  i.e.,  with  n — 1 — a < 0.) 
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Too  easy. 


A more  interesting 
(still  unsolved) 
problem:  Restrict 

both  a and  |3  to 
be  < 1 , and  ask 
when  the  given 
multiset  determines 
the  unordered 
pair  {a,  |3}. 


Continuing  along  such  lines  now  leads  to  the  following  interpretation: 
Kn  is  the  least  number  > n in  the  multiset  S of  all  numbers  of  the  form 

1 + a’  + a’  a2  + a’  CbClj  + . . . + a’  Q2Q3  . . . a,  , 

where  m 3;  0 and  each  is  2 or  3.  Thus, 

S = {1,3,4,7,9,10,13,15,19,21,22,27,28,31,31,...}; 

the  number  31  is  in  S “twice”  because  it  has  two  representations  1 +2  +4  + 

8 + 16  =1  +3  + 9+18.  (Incidentally,  Michael  F’redman  [108]  has  shown  that 
limn^oc  Kn/n  = 1,  i.e.,  that  S has  no  enormous  gaps.) 

3 44  Let  dn  '=  D^1],  mumble (q  — 1 ),  so  that  Du  1 = (qD^J,  +dnql)/(q  - 1) 
and  c4ql  = 'Drq  ,/iq  - 1)].  Now  D^J,  ^ (q  _ 1 )u  <++>  af  <C  n,  and  the 
results  follow.  (This  is  the  solution  found  by  Euler  [94],  who  determined  the 
a’s  and  d’s  sequentially  without  realizing  that  a single  sequence  D[vq*  would 
suffice.) 

3.45  Let  a > 1 satisfy  a + 1/a  = 2m.  Then  we  find  2Yn  = a2"  + a-2",  and 
it  follows  that  Yn  = [ a2  /2] 

3.46  The  hint  follows  from  (3.9),  since  2n(n+ 1 ) = [2(n+  -})2j . Let  n + 0 = 
(\/2  + \/2 1 ')m  and  n’  + 0'  = (\/2l+1  + \fl  )m,  where  0 $1  0,  0'  < 1 . 
Then  0'  = 20  mod  1 = 20  ■—  d,  where  d is  0 or  1.  We  want  to  prove  that 
n’  z=  [v/2(n  + \ )J ; this  equality  holds  if  and  only  if 

0 SC  0,(2-v/2)  + V/2(l  -d)  < 2. 

To  solve  the  recurrence,  note  that  Spec(  1+1  I \fl ) and  Specf  1 + y/2. ) partition 
the  positive  integers;  hence  any  positive  integer  a can  be  written  uniquely  in 
the  form  a = |_(\/2  + y/2  ) mj , where  1 and  m are  integers  with  m odd 

and  l {>  0.  It  follows  that  Ln  = [(  y/2.l+n  + \fll+n  1 } raj . 

3.47  (a)  c = — 2 - (b)  c is  an  integer,  (c)  c = 0.  (d)  c is  arbitrary.  See  the 
answer  to  exercise  1.2.4-40  in  [173]  for  more  general  results. 

3.48  (Solution  by  Heinrich  Rolletschek.)  We  can  replace  (a,  (3)  by  ({  (3), 
cx  + |_|3J  ) without  changing  [rtcxj  + {n(3J.  Hence  the  condition  a = {{3}  is 
necessary.  It  is  also  sufficient:  Let  m = | |3J  be  the  least  element  of  the  given 
multiset,  and  let  S be  the  multiset  obtained  from  the  given  one  by  subtracting 
rnn  from  the  nth  smallest  element,  for  all  n.  If  a = {(3),  consecutive  elements 
of  S differ  by  either  C or  2,  hence  the  multiset  --S  = Spec ( cx)  determines  a. 

3.49  According  to  unpublished  notes  of  William  A.  Veech,  it  is  sufficient  to 
have  a (3,  (3,  and  1 linearly  independent  over  the  rationals. 
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3.50  H.  S.  Wilf  observes  that  the  functional  equation  f(x2  — 1)  = f(x)2  would 

determine  f(x)  for  all  x ;>  (j)  if  we  knew  f(x)  on  any  interval  (4>  • • (J)  + e). 

3.51  There  are  infinitely  many  ways  to  partition  the  positive  integers  into 

three  or  more  generalized  spectra  with  irrational  cX|<;  for  example, 

Spec(2a;  0)  u Spec(4a;  -a)  u Spec(4a;  -3a)  u Spec)  (3;  0) 

works.  But  there’s  a precise  sense  in  which  all  such  partitions  arise  by  “ex- 
panding” a basic  one,  Specf  a)  U Spec(  (3);  see  [128],  The  only  known  rational 
examples,  e.g., 

Spec(7;  -3)  u Spec( \\ -1)  u Spec(^;  0)  , 

are  based  on  parameters  like  those  in  the  stated  conjecture,  which  is  due  to 
A.  S.  Fraenkel  [103]. 

3.52  Partial  results  are  discussed  in  [77,  pages  30-31], 

4.1  1,  2,  1,  6,  16,  12, 

4.2  Note  that  mp  + np  = min(mp,  np)+  max(mp,  np).  The  recurrence 

lcm (m,n)  = (n/(n  mod  m))  lcm(n  mod  m,  m)  is  valid  but  not  really  advis- 
able for  computing  lcm’s;  the  best  way  known  to  compute  lcm(m,  n)  is  to 
compute  gcd(m.n)  first  and  then  to  divide  mn  by  the  gcd. 

4.3  This  holds  if  x is  an  integer,  but  n(x)  is  defined  for  all  real  x.  The 
correct  formula, 

n(x)  — 7t(x  ™ 1 ) = [ [xj  is  prime]  , 

is  easy  to  verify. 

4.4  Between  f and  we’d  have  a left-right  reflected  Stern-Brocot  tree 
with  all  denominators  negated,  etc.  So  the  result  is  all  fractions  m/n  with 
m J_  n.  The  condition  m’n-mn’  = 1 still  holds  throughout  the  construction. 
(This  is  called  the  Stern-Brocot  wreath,  because  we  can  conveniently  regard 
the  final  j as  identical  to  the  first  thereby  joining  the  trees  in  a cycle  at 
the  top.  The  Stern-Brocot  wreath  has  interesting  applications  to  computer 
graphics  because  it  represents  all  rational  directions  in  the  plane.) 

4.5  l_k  = (J  k)  and  Rk  = ; this  holds  even  when  k < 0.  (We  will  find  a 

general  formula  for  any  product  of  L’s  and  R ' s in  Chapter  6.) 

4.6  a = b.  (Chapter  3 defined  x mod  0 = x,  primarily  so  that  this  would 
be  true.) 

4.7  We  need  m mod  10  = 0.  m mod  9 = k.  and  m mod  8 = 1.  But  m can’t 
be  both  even  and  odd. 


“Man  made 
the  integers: 

All  else  is 
Dieudonne.” 

-R.  K.  Guy 


After  all,  ‘mod  y’ 
sort  of  means  “pre- 
tend y is  zero.”  So  if 

it  already  is,  there’s 
nothing  to  pretend. 
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4.8  We  want  1 Ox  + 6p  = 1 Ox  + y (mod  15) ; hence  5y  = 0 (mod  15);  hence 
y = 0 (mod  3).  We  must  have  y = 0 or  3,  and  x = 0 or  1 . 

4. 9 32k+1  mod  4 □ = 3,  so  (32k+1  — 1 )/2  is  odd.  The  stated  number  is  divisible 

by  (37  1)/2  and  (3”  - l)/2  (and  by  other  numbers). 

4.10  999(1  — 1)(1  — 3L)  = 648. 

4.11  <j(0)  = 1;  cr(  1 ) = -1;  o(n)  = 0 for  n > 1.  (Generalized  Mobius 
functions  defined  on.  arbitrary  partially  ordered  structures  have  interesting 
and  important  properties,  first  explored  by  Weisner  [299]  and  developed  by 
many  other  people,  notably  Gian-Carlo  Rota  [254].) 

ZLd\m  ^k\d  ^d/lc)g(k)-Lk\mLd\(m/ic!^(d]  g(k)  = Lk\m  g(k)  x 
[m/k=  1]  = g(m),  by  (4.7)  and  (4.9). 

4.13  (a)  rip  <C  1 for  all  p;  (b)  p.(n)  ^4  0. 

4.14  True  when  k > 0.  Use  (4.12),  (4.14),  and  (4.15). 

4.15  No.  For  example,  en  mod  5 = [2 or  3];  en  mod  11  = [2,3, 7,  or  10], 

4.16  1/e,  + 1/e2  + • • • + l/en  = 1 — l/(en(en  — 1 ))  = 1 — l/(en+1  -1). 

4.17  We  have  fn  mod  fm  = 2;  hence  gcd(fn,  f , ) = gcd(2,fm)  = 1.  (Inci- 
dentally, the  relation  fu  = fof i . , . fn-i  + 2 is  very  similar  to  the  recurrence 

that  defines  the  Euclid  numbers  e,.) 

4.18  Ifn=  qmandq  is  odd,  2n  + 1 = (2m  + 1 )(2n~m  — 2n~2mH 2m  + l), 

4.19  Let  pi  = 2 and  let  pn  be  the  smallest  prime  greater  than  2Vn-' . Then 
2Pn < pn  < 2Pn_1  41 , and  it  follows  that  we  can  take  b = lim^oc  lg,nl  pn 
where  lg  ,li  is  the  function  lg  iterated  n times.  The  stated  numerical  value 
comes  from  p2  = 5,  p2  = 37.  It  turns  out  that  p4  = 237  + 9,  and  this  gives 
the  more  precise  value 

b « 1.  2516475977905 

(but  no  clue  about  ps). 

4.20  By  Bertrand’s,  postulate,  Pn  < 10" . Let 

K = 21  1(rk2pk  = .200300005..  , ■ 

k^1 

Then  10n2K  = Pn  + fraction  (mod  102n~  '). 

4.21  The  first  sum  is  n(n),  since  the  summand  is  (k  + 1 is  prime).  The 
inner  sum  in  the  second  is  ^i<p<TTJp'''mli  so  il  is  greater  than  1 if  and  only 
if  m is  composite;  again  we  get  n(n).  Finally  [{m/n}]  = [n\m],  so  the  third 
sum  is  an  application  of  Wilson’s  theorem.  To  evaluate  n(n)  by  any  of  these 
formulas  is,  of  course,  sheer  lunacy. 
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4.22  (bmn  — 1)/(b  — 1)  = ((bm  - 1 )/(b  - 1 ))(bmn~m  H 1-1).  [The  only 

prime  numbers  of  the  form  (1  0P  — 1 )/9  for  p < 2000  occur  when  p = 2,  19, 

2 3,  3 1 7,  1 03  1,1 

4. 23  p(2k  + 1)  = 0;  p(2k)  = p(k)  + 1,  for  k ^ 1.  By  induction  we  can  show 
that  p(n)  = p(n  — 2m),  if  n > 2”  and  m > p(n).  The  kth  Hanoi  move  is  disk 
p(k),  if  we  number  the  disks  0,  1,  . . . , n — 1.  This  is  clear  if  k is  a power  of  2. 
And  if  2”  <k<2m+1  t we  have  p(k)  < m;  moves  k and  k — 2m  correspond  in 
the  sequence  that  transfers  m + 1 disks  in  Tm  + 1 + Tm  steps. 

4.24  The  digit  that  contributes  dpm  to  n contributes  dpm  1 + . . . + d = 
d(pm  — 1 )/(p  - 1)  to  ep(rt!),  hence  ep(n!)  = (n  - vp(n))/(p  - 1). 

4.2  5 m\\n  mp  = 0 or  mp  = np,  for  all  p.  It  follows  that  (a)  is  true. 
But  (b)  fails,  in  our  favorite  example  m = 12,  n = 18.  (This  is  a common 
fallacy.) 

4.26  Yes,  since  9n  defines  a subtree  of  the  Stern-Brocot  tree. 

4.27  Extend  the  shorter  string  with  M’s  (since  M lies  alphabetically  be- 
tween L and  R)  until  both  strings  are  the  same  length,  then  use  dictionary 
order.  For  example,  the  topmost  levels  of  the  tree  are  LL  < LM  < LR  < 
MM  < RL  < RM  < R R . (Another  solution  is  to  append  the  infinite  string 
RL”  to  both  inputs,  and  to  keep  comparing  until  finding  L < R.) 

4.28  We  need  to  use  only  the  first  part  of  the  representation: 

RRRLL  LLLLLRRRRRR 

LLLillLliiLiLUliilfiJllin  135 
1 ’ 1 ’ 1 > 1 ’ 2 ' 3 ’ 4 > 5 ’ 6 ’ 7 ’ 8 ’ 15  ’ 22  ’ 29  ’ 36  ’ 43  ' 

The  fraction  y appears  because  it’s  a better  upper  bound  than  y , not  because 
it’s  closer  than  y . Similarly,  yy  is  a better  lower  bound  than  y.  The  simplest 
upper  bounds  and  the  simplest  lower  bounds  all  appear,  but  the  next  really 
good  approximation  doesn’t  occur  until  just  before  the  string  of  R’s  switches 
back  to  L. 

4.29  1 /a.  To  get  1 — x from  x in  binary  notation,  we  interchange  0 and  1;  to 
get  1 / a from  a in  Stern-Brocot  notation,  we  interchange  L and  R.  (The  finite 
cases  must  also  be  considered,  but  they  must  work  since  the  correspondence 
is  order  preserving.) 

4.30  The  m integers  x e [A,  A+m)  are  different  mod  m;  hence  their  residues 
(x  mod  ml,.  . .,  x mod  m,)  run  through  all  mi  • • • TTLr  = m possible  values,  one 
of  which  must  be  ( a1  mod  ml,.  . . , aT  mod  m,)  by  the  pigeonhole  principle. 

4.31  A number  in  radix  b notation  is  divisible  by  d if  and  only  if  the  sum 
of  its  digits  is  divisible  by  d,  whenever  b = 1 (mod  d).  This  follows  because 
(a,.  . . cio)b  = Qmbm  + • ■ ■ + Q-ob°  = am  + • • • + cio- 
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4.32  The  cp(m)  numbers  { kn  mod  m k J_  m and  0 ^ k < m}  are  the  num- 
bers {k  k _L  m and  0 <;  k < m}  in  some  order.  Multiply  them  together  and 
divide  by  n0<;k<m.  iam  k- 

4.33  Obviously  h(l)  = 1.  If  m _L  n then  h(mnj  = L d\mn  f(d)  g(mn/d)  = 
Lc\m,d\nf(cd)  g((m/c)(n/d))  = £c\m  Ld\n  %)  g(m/c)  f(d)  g(n/d);  this  is 
h(m)  h(n),  since  c 1.  d for  every  term  in  the  sum. 

4.34  g(m)  = Hd\m  f(d)  = Ld\mf(m/d)=  Zd?i  f(m/d)  if  f(x)  is  zero 
when  x is  not  an  integer. 

4,3  5 The  base  cases  are 


1(0, n)  = 0;  I(m,  0)  = 1, 

When  m,  n > 0,  there  are  two  rules,  where  the  first  is  trivial  if  rn  > n and 
the  second  is  trivial  if  m < n: 


I(m, n)  = I(m, n mod  m)  |_n/mJ  I(n  m°d  tn,  m) ; 
I(m,n)  = I(m  mod  n,n)  , 


4.36  A factorization  of  any  of  the  given  quantities  into  nonunits  must  have 

m2  10n2  = ±2  or  ±3,  but  this  is  impossible  mod  10. 

4,  37  Let  a,  = 2~nln(en  - 7 ) and  bn  = 2^nln(en+  5).  Then 

en  = Lb2"  + an  ^ In  E < bn . 

And  an_i  < a„  < bn  < bn  1,  so  we  can  take  E = limn^oo  eQn.  In  fact,  it 
turns  out  that 


= !n 


n^l 


1 + 


(2en  — 1 )2 


1/2a 


a product  that  converges  rapidly  to  (1 .264084735305301 1 1 )2.  But  these  ob- 
servations don’t  tell  us  what  en  is,  unless  we  can  find  another  expression  for  E 
that  doesn’t  depend  on  Euclid  numbers. 

4.38  an-bn  = (am-bm)(an~mb0  + an_2mbmH |.anmodm^n-m-nmodm)_|_ 

j^mLn/mJ  jQnmodm  _ ^nmodmj 

4.39  If  Q] . . . at  and  bi  • • . bu  are  perfect  squares,  so  is 
Qi  Qtbi  • • • bu/cf  . . c2  , 


where  {ai , . . . , at}  (T  {bi , . . . , bu}  = {ci , . . . , Cv}.  (It  can  be  shown,  in  fact,  that 
the  sequence  (S(1 ),  S(2),  S(3), . • • , ) contains  every  nonprime  positive  integer 
exactly  once.) 
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4.40  Let  f(n)  = Ilisjk^n.pXk  k = n!/PLn/pJ  Ln/Pj ! and  g(n)  = n!/pe>',n!). 
Then 

g(n)  = f(n)f(  Ln/pJ)  f(  |n/p2J) , , , = f(n)  g(  Ln/pJ)  . 

Also  f(n)  = a0!(p  - 1 ) ! L^/pJ  _ a0!(— 1 )Ln/p-l  (mod  p),  and  ep(n!)=  Ln/Pj  + 
ep  ([n/pj !)  • These  recurrences  make  it  easy  to  prove  the  result  by  induction. 
(Several  other  solutions  are  possible.) 

4.41  (a)  If  ri2  = -1  (mod  p)  then  (n.2)*p_152  = -1;  but  Fermat  says  it’s 
+ 1.  (b)  Let  n = ((p  - 1 )/2) !;  we  have  n = (~1)(p-,)/2  Ili<:k<p/2(P  “kl  = 
(p  — l)!/n, hence  n2=  (p  — 1 )!. 

4.42  First  we  observe  that  k ± 1 <=>  k _L  l -f  ak  for  any  integer  a,  since 
gcd(k,  l)  = gcd(k,  l -f  ak)  by  Euclid’s  algorithm.  Now 

rti-Lriand  n/  _L  n mn’  J_  n 

m n ’ + n m ’ _L  n 

Similarly 

m ’ 1 n ’ and  n-!-n’  <=4>  m n ’ + n m ’ J.  n ’ . 


Hence 


m _L  n and  m’  _L  n’  and  n J.  n’  <=>  mn’+nm’  Inn’. 

4.43  We  want  to  multiply  by  L 1 R,  then  by  R_1  L_1RL,  then  L"1  R,  then 
R“2L_1RL2,  etc.;  the  nth  multiplier  is  R n ) L 1 RLP* n * , since  we  must  cancel 
p(n)  R’s.  And  R mL  1 RLm  = (?2m+i)- 

4.44  We  can  find  the  simplest  rational  number  that  lies  in 

[.3155,  .3165)  = 

by  looking  at  the  Stern-Brocot  representations  of  and  and  stopping 
just  before  the  former  has  L where  the  latter  has  R: 

(mi,ni,m2,Ti2)  : = (631,2000,633,2000); 
while  mi  > ri]  or  m2  < do 

if  m.2  < Tl2  then  (output(L);  (m , n^)  :=  (ni,n2)  - (mi,m2)) 

else  (output(R);  (mi,  m2)  :=  (mi,  m2)  (ni  ,n2))  . 

The  output  is  LLLRRRRR  = ^ ~ ,3158.  Incidentally,  an  average  of  ,334 
implies  at  least  287  at  bats. 
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4.45  x2  = x (mod  10n)  <^=>  x(x  - 1)  = 0 (mod  2”)  and  x(x  - 1)  = 0 

(mod  5n)  <=>  x mod  2”  = [Oor  1]  and  x mod  5n  = [Oor  11.  (The  last  step 

is  justified  because  x(x  1)  mod  5 = 0 implies  that  either  x or  x 1 is  a 
multiple  of  5,  in  which  case  the  other  factor  is  relatively  prime  to  5n  and  can 
be  divided  from  the  congruence.) 

So  there  are  at  most  four  solutions,  of  which  two  (x  = 0 and  x = 1) 
don’t  qualify  for  the  title  “n-digit  number”  unless  n = 1.  The  other  two 
solutions  have  the  forms  x and  1 0n  + 1 — x,  and  at  least  one  of  these  numbers 
is  1.  wOn_^,  When  n = 4 the  other  solution,  10001  9376  = 625,  is  not  a 

four-digit  number.  We  expect  to  get  two  n-digit  solutions  for  about  90%  of 
all  n,  but  this  conjecture  has  not  been  proved. 

(Such  self-reproducing  numbers  have  been  called  “automorphic.”) 

4.46  (a)  If  j’j  — k’k  = gcd(j,k),  we  have  ri.k,kngcd,’,k'  = n’  ’ = 1 and 

nk  k = 1.  (b)  Letu  = pq,  where  p is  the  smallest  prime  divisor  of  n.  If 

2”  = 1 (mod  n)  then  2n  = 1 (mod  p).  Also  2P~'  = 1 (mod  p);  hence 
2gcd(p-  l ,ti)  = i (mod  p).  But  gcd(p  — 1 ,rt)  = 1 by  the  definition  of  p. 

4.47  If  nm_1  = 1 (mod  m)  we  must  have  n J_  m.  If  nk  = rd  for  some 

1 S;  j < k < m,  then  nk~’  = 1 because  we  can  divide  by  nk  Therefore  if  the 
numbers  n1  mod  m,  . , nm~'  mod  m are  not  distinct,  there  is  a k <m  — 1 
with  uk  = 1.  The  least  such  k divides  m-  1,  by  exercise  46(a).  But  then  kq  = 
(m  1 )/p  for  some  prime  p and  some  positive  integer  q;  this  is  impossible, 
since  nkc|  ^ 1.  Therefore  the  numbers  n1  mod  m,  . . , mod  m are 

distinct  and  relatively  prime  to  m.  Therefore  the  numbers  1 , . , m — 1 are 
relatively  prime  to  m,  and  m must  be  prime. 

4.48  By  pairing  numbers  up  with  their  inverses,  we  can  reduce  the  product 
(mod  m)  to  []l5:n<m1n2modm=i  W Now  we  can  use  our  knowledge  of  the 
solutions  to  n2  mod  m = 1 . By  residue  arithmetic  we  find  that  the  result  is 
m — 1 if  m = 4,  pk,  or  2pk  (p  > 2);  otherwise  it’s  -fl, 

4.49  (a)  Either  m < n (®(N  — 1)  cases)  or  m = n (one  case)  or  m > n 
(®(N  - 1)  again).  Hence  R(N)  = 2®(N  - 1)  + 1.  (b)  From  (4.62)  we  get 

2G>(N  - 1 ) + 1 = 1 +^R(dHN/dJLN/d-lJ  ; 

djsl 

hence  the  stated  result  holds  if  and  only  if 

Y_  p(d)  [N/dj  = 1 , forN  :>  1 
d^l 

And  this  is  a special  case  of  (4.61)  if  we  set  f(x)  = (x  ^1) 
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4.50  (a)  If  f is  any  function, 

Y_  f(k)  = ^ £ f(k)[d  = gcd(k,m)] 

0$k<m  d\m  0^k<m 

= 1.1.  f(k)  [k/dl  m/d] 

d\m  0^k<m 

= £2  f(kd)[klm/d] 

d\m  Os:k<m/d 

= II  f(km/d)  [kid]  ; 

d\m  0$k<d 

we  saw  a special  case  of  this  in  the  derivation  of  (4.63).  An  analogous  deriva- 
tion holds  for  f"[  instead  of  Y_.  Thus  we  have 

i=nu-tuk)=  nn(--km/d)= 

0$k<m  d\m  0$k<d  d\m 

kid 

because  mm/d  = e27tl^d. 

Part  (b)  follows  from  part  (a)  by  the  analog  of  (4.56)  for  products 
instead  of  sums.  Incidentally,  this  formula  shows  that  'Pm(z)  has  integer 
coefficients,  since  ^m(z)  is  obtained  by  multiplying  and  dividing  polynomials 
whose  leading  coefficient  is  1. 

4.51  (Xi+- ■ -+xn)p  - Lk,+...+k„=pP!/(ki!  • kn!)xk|  . . . x^n , and  the  coeffi- 
cient is  divisible  by  p unless  some  kj  = p.  Hence  (xi  + ■ . -+xn)p  = X +•  - + x[[ 
(mod  p).  Now  we  can  set  all  the  x’s  to  1,  obtaining  np  = n. 

4.52  If  p > n there  is  nothing  to  prove.  Otherwise  x 1 p,  so  xk*p  1 = 1 

(mod  p);  this  means  that  at  least  [(n  1 )/ (p  1)J  of  the  given  numbers  are 

multiples  of  p.  And  (n  — 1)/(p  — 1)  />  n/p  since  n 1>  p. 

4.53  First  show  that  if  m 6 and  m is  not  prime  then  (m-2)!  = 0 (mod  m). 
(If  m = p2(  the  product  for  (m  — 2)!  includes  p and  2p;  otherwise  it  includes 
d and  m/d  where  d < m/d.)  Next  consider  cases: 

Case  0,  n <5.  The  condition  holds  for  n = 1 only. 

Case  1,  n / 5 and  n is  prime.  Then  (n  l)!/(n  + 1)  is  an  integer  and 
it  can’t  be  a multiple  of  n. 

Case  2,  n ^ 5,  n is  composite,  and  n + 1 is  composite.  Then  n and 
n+1  divide  (n  — 1 )!,  and  n 1 n + 1 ; hence  n(rt  + 1 )\(n  — 1 )!. 

Case  3,  n ^ 5,  n is  composite,  and  n + 1 is  prime.  Then  (n  — I)!  = 1 
(mod  n + 1)  by  Wilson’s  theorem,  and 

L(n  — l)!/(n  + 1)J  = ((n-  1)!  + n)/(n  + 1) ; 
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this  is  divisible  by  n. 

Therefore  the  answer  is:  Either  n = 1 or  n / 4 is  composite. 

4.54  e2  (1  OOO!)  > 500  ande5  (1  OOO!)  = 249,  hence  1 OOO!  = q • 1 0249  for  some 
even  integer  a.  Since  1000  = (1300)5,  exercise  40  tells  us  that  q-  2249  = 

1 000! /5249  = -1  (mod  5).  Also  2^9  = 2,  hence  a = 2,  hence  a mod  10  = 2 
or  7;  hence  the  answer  is  2-1  0249. 

4.55  One  way  is  to  prove  by  induction  that  P2n/Pn(r'-  + * ) 4S  an  integer; 
this  stronger  result  helps  the  induction  go  through.  Another  way  is  based 
on  showing  that  each  prime  p divides  the  numerator  at  least  as  often  as  it 
divides  the  denominator.  This  reduces  to  proving  the  inequality 

In  ri 

Jjk/mj  ^ 4jjk/mJ  , 

k=l  k=l 

which  follows  from 

L(2n  - 1 )/mJ  + |_2n/mJ  ^ [n/mj 

The  latter  is  true  when  0 <i  n < m,  and  both  sides  increase  by  4 when  n is 
increased  by  m. 

4.56  Let  f(m)  = min(k, 2n— k) [m\k],  g(m)  = (2n— 2k— 1 ) x 

[m\(2k  + 1 )] . The  number  of  times  p divides  the  numerator  of  the  stated 
product  is  f(p)  + f(p2)  + f (p3 ) + • • • , and  the  number  of  times  p divides  the 
denominator  is  g(p)  + g (p2 ) + g (p3 ) + ■■■  . But  f(m)  = g(m)  whenever  m 
is  odd,  by  exercise  2.32.  The  stated  product  therefore  reduces  to  2n*n  1 ' by 
exercise  3.22. 

4.57  The  hint  suggests  a standard  interchange  of  summation,  since 

[d\m]  = ^ [m=  dkl  = |n/dj  . 

I^m^n  0<k^n/d 

Calling  the  hinted  sum  ,X(n),  we  have 

I(m  + n)  — I(m)  — I(n)  = ^ cp(d) . 

d6S(m,n; 

On  the  other  hand,  we  know  from  (4.54)  that  L(n)  = ^u(u  + 1).  Hence 
£(m  + n)  X(m)  I(n)  = mn. 

4.58  The  function  f(m)  is  multiplicative,  and  when  m = pk  it  equals  1 + 
p + + pk.  This  is  a power  of  2 if  and  only  if  p is  a Mersenne  prime  and 
k = 1 , For  k must  be  odd,  and  in  that  case  the  sum  is 

(1  +p)(l  +p2  +p4  + ---+pk  ’) 
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and  (k-  1)/2  must  be  odd,  etc.  The  necessary  and  sufficient  condition  is  that 
m be  a product  of  distinct  Nersenne  primes. 

4.59  Proof  of  the  hint:  If  rt  = 1 we  have  X]  — (X  = 2,  so  there’s  no  problem. 

If  n > 1 we  can  assume  that  xi  ^ ^ x,.  Case  1:  x^1  + . . . + x'i,  + 

(xn  — 1 ) 1 ^ 1 and  xn  > xn_i.  Then  we  can  find  (3  ^ xn  — 1 ^ xn^i  such 
that  xf1  + • • • + X“I,  + (3  1 = 1 ; hence  xn  ^ (3  + 1 <(  en  and  xi  . . . xn  ^ 
Xi . . . Xn  -i  ((3  + 1)  $ ei  . . . ea,  by  induction.  There  is  a positive  integer  m 
such  that  a = X]  . . . x,/m;  hence  a <C  . . . en  = en+i  1,  and  we  have 

xi  ■ • • xn(a+  1)  $ ei . . .enen+i.  C a s e 2 : + h x~l,  + (xn  - 1 T'  ^ 1 

and  xn  = xn_i.  Let  a = xn  and  a-1  + (a  — 1 )_1  = (a  — 2)_1  + C-’.  Then 

we  can  show  that  a 3 4 and  (a-2)(<+  1)  :>  a2.  So  there’s  a (3  ^ £ such 
that  x^'  + • • • + x~l2  + (a  — 2)_1  + |3-1  =1;  it  follows  by  induction  that 
Xi-  . . xn^X]  • xn_2(a  — 2)(C  + 1)  ^ X,  , ,.xn_2(a-2)((3  +1)  ^ ei  . ..en, 
and  we  can  finish  as  before.  Case  3:  x^1  + . . . + x“I,  + (x,  — 1)_1  < 1. 

Let  a = xn,  and  let  a-1  + a-1  = (a  — 1 )_1  + (3'1.  It  can  be  shown  that 

(a  — 1)  (3+1)  > a(  a + 1 ) , because  this  identity  is  equivalent  to 

aa2  — a2a  + aa  — a2  + a + a > 0, 

which  is  a consequence  of  aa(  a — a)  + ( 1 + a)a  ^ ( 1 + a)a  > a2  — a,  Hence 

we  can  replace  xn  and  a by  a — 1 and  3,  repeating  this  transformation  until 

cases  1 or  2 apply. 

Another  consequence  of  the  hint  is  that  1 /xi  + . . . + I / x , <1  implies 
1/xi  + "•  +1/xn  <(  1 /ci  + <•«  +1/en;  see  exercise  16. 

4.60  The  main  point  is  that  0 < Then  we  can  take  pi  sufficiently  large 

(to  meet  the  conditions  below)  and  pn  to  be  the  least  prime  greater  than 
Pn  , . With  this  definition  let  a„  = 3~nlnpn  and  bn  = 3_nln(pn  + 1).  If  we 
can  show  that  an_i  ^ a„  < bn  ^ bn_i , we  can  take  P = limn_>00  ea"  as  in 
exercise  37.  But  this  hypothesis  is  equivalent  to  p3  -i^Pn  <(Pn-1  + 1)3-  If 
there’s  no  prime  pn  in  this  range,  there  must  be  a prime  p < p3_,  such  that 
p + cp9  > (pn  i +1  )3.  But  this  implies  that  cpe  > 3p2'3 , which  is  impossible 
when  p is  sufficiently  large. 

We  can  almost  certainly  take  pi  = 2,  since  all  available  evidence  indi- 
cates that  the  known  bounds  on  gaps  between  primes  are  much  weaker  than 
the  truth  (see  exercise  69).  Then  p2  = 11,  P3  — 1361,  P4  = 2521008887,  and 

1, 306377883863  < P < 1. 306377883869, 

4.61  Let  tti  and  n be  the  right-hand  sides;  observe  that  fin’  m/fl  = 1, 
hence  m _L  n.  Also  fh/fi  > rrl,/H,  and  N = ((n  + N )/n')n  ' n > n > 
((n  + N)/n'  — 1)n'  — n — N - n’ 0.  So  we  have  m/n  ^ m"/n% If  equality 
doesn’t  hold,  we  have  n”  = (mu'  — m'n)n"  = n’(  tin”  — m"n)  + n(m"n' 
m’n”)  ^ n’  + n > N,  a contradiction. 
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Incidentally,  this  exercise  implies  that  (m  + m,, )/(n  + n”)  = tn,/n/, 
although  the  former  fraction  is  not  always  reduced. 

4.62  2 ‘ $ 2 2+2}~2  6 -2  7 + 2-u  + 2-u-2-20-2-2]+2-i0  + 2-3'- 
2 42  _ 2 43  + can  p,e  written 

-4k2-6k-3  _ 2-4k2-10k  - 7 ) 


3B2 

k^O 


Incidentally,  this  sum  can  be  expressed  in  closed  form  using  the  “theta  func- 
tion” 0(z,  A)  = e~7iAk  +2iz,k;  we  have 

e t— > ^ + |;0(  ^ In  2, 3i  In  2)  — T§g  9(  ^ In  2, 51  In  2) 


I have  discovered  a 
wonderful  proof  of 
Fermat's  Last  Theo- 
rem, but  there's  no 
room  for  it  here. 


4.63  Any  n > 2 either  has  a prime  divisor  d or  is  divisible  by  d = 4.  In  either 
case,  a solution  with  exponent  n implies  a solution  (an//d)dri-('b,x/d)d  = (cn//d)d 
with  exponent  d.  Since  d = 4 has  no  solutions,  d must  be  prime. 

The  hint  follows  from  the  binomial  theorem,  since  ap  + (x  — a)p—  pap  1 
is  a multiple  of  x when  p is  odd.  Assume  that  a J_  x.  If  x is  not  divisible 
by  p,  x is  relatively  prime  to  cp/x;  hence  x = ppP  for  some  m.  If  x is  divisible 
by  p,  then  Cp/x  is  divisible  by  p but  not  by  pd,  and  cp  has  no  other  factors 
in  common  with  x. 


Therefore,  if  Fer- 
mat’s Last  Theorem 
is  false,  the  universe 
will  not  be  big 
enough  to  write 
down  any  numbers 
that  disprove  it. 


(The  values  of  a,  b,  c must,  in  fact,  be  even  higher  than  this  result 
indicates!  Inkeri  [160]  has  proved  that 

/ 2p3+p\P 

A sketch  of  his  proof  appears  in  [249,  pages  228-229],  a book  that  contains 
an  extensive  survey  of  progress  on  Fermat’s  Last  Theorem.) 


4.64  Equal  fractions  in  !PN  appear  in  “organ-pipe  order”: 


2m  4m  rm  3mm 

2n  ’ 4n  ’ ‘ ‘ ‘ ‘ rn  ’ ' ' ' ' 3n  ’ n ' 

Suppose  that  1PN  is  correct;  we  want  to  prove  that  !Pn  + i is  correct.  This 
means  that  if  kN  is  odd,  we  want  to  show  that 


k - 1 

N + 1 


^N.kN  ; 


if  kN  is  even,  we  want  to  show  that 


k - 1 


J*N,kN  1 3VkN 


N + 1 


J’n+N  JV|,kN  + 1 , 
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In  both  cases  it  will  be  helpful  to  know  the  number  of  fractions  that  are 
strictly  less  than  (k  — 1)/(N  + 1)  in  TN;  this  is 


n— 1 m 


r m k — 1 

N 

- 

’(k  - 1 )n~ 

N 

— V 

(k  — 1 )n  + N 

i Nn  N + lJ 

- 2_ 

n— 1 

N + l 

- 2_ 

n=0 

N + l 

(k  — 2)N  d-1  , 

1 h d 


= -(kN-d+1),  d = gcd(k— 1 , N+1 ), 


by  (3.32).  Furthermore,  the  number  of  fractions  equal  to  (k  — 1 )/(N  + 1 ) in 
IPn  that  should  precede  it  in  Tn+i  is  j (d  — 1 — [d  even]),  by  the  nature  of 
organ-pipe  order. 

If  kN  is  odd,  then  d is  even  and  ( k- 1 ) / ( N + 1 ) is  preceded  by  \ ( kN  - 1 ) 
elements  of  TN;  this  is  just  the  correct  number  to  make  things  work.  If  kN  is 
even,  than  d is  odd  and  (k  — 1 )/( N + 1)  is  preceded  by  j (kN  ) elements  of  (PN, 
If  d = 1 , none  of  these  equals  (k  — 1 )/(N  + 1 ) and  IPn+n  is  otherwise 
(k-  1 )/(N  + 1)  falls  between  two  equal  elements  and  (PN  kNis  (C.  S.  Peirce 
[230]  independently  discovered  the  Stern-Brocot  tree  at  about  the  same  time 
as  he  discovered  JV) 


4.65  The  analogous  question  for  the  (analogous)  Fermat  numbers  fn  is  a 
famous  unsolved  problem.  This  one  might  be  easier  or  harder. 

4.66  It  is  known  that  no  square  less  than  36  x 1018  divides  a Mersenne 
number  or  Fermat  number.  But  there  has  still  been  no  proof  of  Schinzel’s 
conjecture  that  there  exist  infinitely  many  square  free  Mersenne  numbers.  It 
is  not  even  known  if  there  are  infinitely  many  p such  that  p\\(  a ± b),  where 
all  prime  factors  of  a and  b are  ^ 3 1 . 


“No  square  less 
than  25  x 1014 
divides  a Euclid 
number.  ” 

— llan  Vardi 


4.67  M.  Szegedy  has  proved  this  conjecture  for  all  large  n;  see  [284'],  [77, 
pp.  78-79],  and  [49]. 

4.68  This  is  a much  weaker  conjecture  than  the  result  in  the  following  ex- 
ercise. 


4.69  Cramer  [56]  showed  that  this  conjecture  is  plausible  on  probabilistic 
grounds,  and  computational  experience  bears  this  out:  Brent  [32]  has  shown 
that  Pa+i  Pa  ^602  forPn+i  < 2.686  xlO12.  But  the  much  weaker  bounds 
in  exercise  60  are  the  best  currently  proved  [221],  Exercise  68  has  a “yes” 
answer  if  Pn+,  — Pn  < 2Pa2  for  all  sufficiently  large  n.  According  to  Guy  [139, 
problem  A8],  Paul  Erdos  offers  $10,000  for  proof  that  there  are  infinitely  many 
n such  that 


P n+l  Pn  > 


clnn  lnlnn  lnlnlnlnn 
(In  In  Inn)2 
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What's  II4 
radix  1 1 ? 


for  all  c > 0. 

4.70  This  holds  if  and  only  if  "V2  (n)  = ^(n),  according  to  exercise  24.  The 
methods  of  [78]  may  help  to  crack  this  conjecture. 

4.71  When  k = 3 the  smallest  solution  is  n = 4700063497  = 1 9 -47- 5263229; 
no  other  solutions  are  known  in  this  case. 

4.72  This  is  known  to  be  true  for  infinitely  many  values  of  a,  including  -1 
(of  course)  and  0 (not  so  obviously).  Lehmer  [199]  has  a famous  conjecture 
that  cp(n)\(n  — 1)  if  and  only  if  n is  prime. 

4.73  This  is  known  to  be  equivalent  to  the  Riemann  hypothesis  (that  all 
zeros  of  the  complex  zeta  function  with  real  part  between  0 and  1 have  real 
part  equal  to  1/2). 

4.74  Experimental  evidence  suggests  that  there  are  about  p ( 1 1 /e)  dis- 

tinct values,  just  as  if  the  factorials  were  randomly  distributed  modulo  p. 

in  5.1  (II)4  = (14641  )r,  in  any  number  system  of  radix  r ^ 7,  because  of  the 

binomial  theorem. 

5.2  The  ratio  / (k)  = (n-  k)/(k+  1)  is  ^ 1 when  k [_ri/2J  and  ^ 1 
when  k < [tl/2],  so  the  maximum  occurs  when  k = [n/2J  and  k = pn/2]. 

5 . 3 Expand  into  factorials.  Both  products  are  equal  to  f(n)/f(n  — k)f(k), 
where  f(n)  = (n  + 1)!n!  (n-  1)!. 

5.4  (-’)=  M)km=  H)k(kJ=  H)kDOO], 

5.5  If  0 < k < p,  there’s  a p in  the  numerator  of  (£)  with  nothing  to  cancel 
it  in  the  denominator.  Since  (£)  = (pk')  + ([[/]),  we  must  have  (pk1)  = ( — 1)k 
(mod  p),  for  0 <0  k < p. 

5 , 6 The  crucial  step  (after  second  down)  should  be 


The  original  derivation  forgot  to  include  this  extra  term,  which  is  [n  = 0] . 


512  ANSWERS  TO  EXERCISES 


5 . 7 Yes,  because  r—  = ( ■ ■ 1 ) k/  ( — r — 1 )b.  We  also  have 

rk(r  + l)k  = (2T)55/ 22k. 

5.8  f ( k)  = (k/n  1 )n  is  a polynomial  of  degree  n whose  leading  coefficient 
is  n_n.  By  (5.40),  the  sum  is  n!/nn.  When  n is  large,  Stirling’s  approxima- 
tion says  that  this  is  approximately  v/27tn/en.  (This  is  quite  different  from 
(1  — 1/e),  which  is  what  we  get  if  we  use  the  approximation  (1  — k/n)n  ~ e~k, 
valid  for  fixed  k as  n — > 00.) 

5 . 9 Mz)*  - j^0t(tk  + t)k_,zk/k!=  Ek^o(k+  ^)k  '(tzjVk!  = £1  (tz), 
by  (5-60). 

5.10  £k?02zk/(k+2)=  F(2,l;3;  z),  since  tk+1/tk=  (k  + 2)z/(k  + 3). 

5.11  The  first  is  Besselian  and  the  second  is  Gaussian:  But  not  Imbesselian. 

z"'1  sinz  = ^k^0(-1)kz2k/(2k+ 1)!  = F(1;  1 , f;  -z2/4) ; 

1 arcsin  z = £k^0  z2k{\)k/(2k  + l)k!  = F(l,  1;  |;z2). 

5.12  (a)  Yes,  the  term  ratio  is  n.  (b)  No,  the  value  should  be  1 when 
k = 0;  but  (k  + l)n  works,  if  n is  an  integer,  (c)  Yes,  the  term  ratio  is 
(k+1  )(k  + 3)/(k  + 2).  (d)  No,  the  term  ratio  is  1 + 1 /(k  + 1 )Hk;  and  Hk  ~ Ink 
isn’t  a rational  function,  (e)  Yes,  the  term  ratio  is 

t(k+  1)  / T(n  k) 
t(k)  / T(n  - k - f)  ' 

(f)  Not  always;  e.g.,  not  when  t(k)  = 2k  and  T(k)  =1.  (g)  Yes,  the  term  ratio 
can  be  written 

at(k+l  )/t(k)  + b t(k-f2)/t(k)  + ct(k+3)/t(k) 
a + bt(k+1  )/t(k)  + ct(k+2)/t(k)  ' 

and  t(k+m)/t(k)  = (t(k+m)/t(k+m-l))  . . . (t(k+  l)/t(k))  is  arational 
function  of  k. 

5.13  Rn  = n!n+,/P2 = Qn/Pn  = Q^/n!n+1. 

5.14  The  first  factor  in  (5.25)  is  (L  'lk  when  k <;  1,  and  this  is  (-1  )l~k~m  x 

The  sum  for  k ^ 1 is  the  sum  over  all  k,  since  m ^ 0.  (The  condition 
n />  0 isn’t  really  needed,  although  k must  assume  negative  values  if  n < 0.) 

To  go  from  (5.25)  to  (5.26),  first  replace  s by  ■ 1 ■ n — q. 

5.15  If  n is  odd,  the  sum  is  zero,  since  we  can  replace  k by  n-k.  If  n = 2m, 
the  sum  is  ( — 1 )m(3m)!/m!3,  by  (5.29)  with  a = b = c = m. 
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5.16  This  is  just  (2a)!  (2b)!  (2c)!/(a+b)!  (b  + c)!  (c+  a)!  times  (5.29),  if  we 
write  the  summands  in  terms  of  factorials. 

5.17  t2";1'2)  = 0/2^,  (2\y2)  = (2n)/24n;  so  (2\,/2)  - 22n(2n2r|/2). 

5-18  (IO(klk.J/33k. 

5.19  '])•  ;(-z)  1 = Lk^c  (k_tk_1)  H/(k  tk  1))  (-z)k,  by  (5.60),  and 

this  is  (k)(1/(tk-k+  l))zk  = ®t(z). 

5.20  It  equals  F(— ai,  . ,-a,;  — bi,  . . . , — bn ; ( — 1)m+nz);  see  exercise  2.17. 

5.21  limn  _ac  ('rt  + m)— /nm  = 1, 

5.22  Multiplying  and  dividing  instances  of  (5.83)  gives 


(-1/2)! 

x!  (x  — 1/2)! 


lim 

n— ►00 


n + x\  /n  + x-1/2 


n 
2n  - 


n 


lim. 
n->oo  \ 2n 


n 


-2x 


n 


2x 


n — 1/2 
n 


by  (5.34)  and  (5.36).  Also 


1/  ( 2 x ) ! 


= lim 
n-->oo 


2n  + 2x\ 
2n  ) 


(2n) 


-2x 


Hence,  etc.  The  Gamma  function  equivalent,  incidentally,  is 

r(x)  r(x  + i)  = r(2x)  r(i)/22-  1 


5, 23  ( — 1 )nnj , see  (5,50), 

5.  24  This  sum  is  (*)  F("Y/2-m|1)  = (£),  by  (5.35)  and  (5-93)- 
5,  25  This  is  equivalent  to  the  easily  proved  identity 


( a-b?---  = 


(a  + 1 )k 


(b  + 1 ) k (b  + 1)k  bk 


as  well  as  to  the  operator  formula  a — b = (4  + a)  (4  + b). 
Similarly,  we  have 


(ai  - a2)  F 


= a,  F 


cii , ci2 , a.3 , . . . , am 
bi , • . • , bn 

ai+1,a2,a3,...,am 
bi,  . , bn 


z - 


Q7p  / ai , a2+1,  Q3,  • • • , Qm 

2 V bi bn 
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because  d]  a.2  = (ai  + k)  (ai  + k).  If  a]  — bi  is  a nonnegative  integer  d, 
this  second  identity  allows  us  to  express  F(di , , . , a,„;  bi , ■ ■ • , bn;  z)  as  a lin- 
ear combination  of  F(  Q|  + j , Q3 , « . . , a,„;  b2 , • , bn ; z)  for  0 <[  j <C  d,  thereby 
eliminating  an  upper  parameter  and  a lower  parameter.  Thus,  for  example, 
we  get  closed  forms  for  F(  a,  b;  a ■ — 1 ; z ) , F ( a,  b;  a — 2;  z),  etc. 

Gauss  [116,  §7]  derived  analogous  relations  between  F(a,  b;  c;z)  and 
any  two  “contiguous”  hypergeometrics  in  which  a parameter  has  been  changed 
by  ±1  . Rainville  [242 7 ] generalized  this  to  cases  with  more  parameters. 

5.26  If  the  term  ratio  in  the  original  hypergeometric  series  is  t^+i  /t|<  = r(k), 
the  term  ratio  in  the  new  one  is  tk+2/tk+i  = r(k  + 1).  Hence 

/ Cl] , , , , 1 Ctln  \ . + Q1  • • • dm  z r ( d-i  + 1 , . • ■ , Clm  T 1 1 \ 

' U, b„  v ='  -b7T77KTFU,+i b„+i,2  V- 

5.27  This  is  the  sum  of  the  even  terms  of  F(2qi  , . . . , 2cim;  2bi , . • • , 2bm;  z). 
We  have  (2a)2k+*/(2a)2k  = 4(k  + a)(k+  a + i),  etc. 

5.28  We  have  F(Q'b|z)  = (1-z)-QF(a'cc-b|T^)  = (1-z)-“F(c  cb'Q|^)  = 
(1  — z)c  a bF(c  acc  b|z).  (Euler  proved  the  identity  by  showing  that  both 
sides  satisfy  the  same  differential  equation.  The  reflection  law  is  often  at- 
tributed to  Euler,  but  it  does  not  seem  to  appear  in  his  published  papers.) 

5.29  The  coefficients  of  zn  are  equal,  by  Vandermonde’s  convolution.  (Kum- 
rner’s  original  proof  was  different:  He  considered  limm^oe,  F(m,  b — a;  b;  z/m) 
in  the  reflection  law  (5.101).) 

5.  30  Differentiate  again  to  get z(l  — z)T"(z)  + ( 2 — 3z)F'(z)  — F ( z ) = 0. 
Therefore  F ( z ) = F(1 , 1 ; 2; z)  1 by  (5.108). 

5.31  The  condition  f(k)  = cT(k+  1)  — cT(k)  implies  that  f(k+  l)/f(k)  = 

(T(k  + 2)/T(k  + 1 ) — 1)/(l  — T(k)/T(k -F  1)  ) is  a rational  function  of  k. 

5.32  When  summing  a polynomial  in  k,  Gosper’s  method  reduces  to  the 

“method  of  undetermined  coefficients!’  We  have  q(k)  = r(k)  = 1,  and  we 
try  to  solve  p(k)  = s(k+  1)  s(k).  The  method  suggests  letting  s(k)  be  a 

polynomial  whose  degree  is  cl  = deg(p)  + 1. 

5.33  The  solution  to  k = (k-  l)s(k+  1)  — (k+  l)s(k)  is  s(k)  = -k+  j; 
hence  the  answer  is  ( 1 — 2k)/2k(k  — 1)  + C, 

5.34  The  limiting  relation  holds  because  all  terms  for  k > c vanish,  and 
e — c cancels  with  — c in  the  limit  of  the  other  terms.  Therefore  the  second 
partial  sum  is  iime_>o  F(— m,  — n;  e - m;  1 ) = lime^o(e  + rt-Tn)m/(e-rrt)m  - 

5.35  { a } 2-"3n[nJO].  (b)  ;i  [k  0!  = 2k+l  (k  >01. 
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5.36  The  sum  of  the  digits  of  m + n is  the  sum  of  the  digits  of  m plus  the 
sum  of  the  digits  of  n,  minus  p 1 times  the  number  of  carries,  because  each 
carry  decreases  the  digit  sum  by  p 1 . 

5.37  Dividing  the  first  identity  by  n!  yields  (x”y)  = ^Ik  (k)  (ny k) , Van- 
dermonde’s convolution.  The  second  identity  follows,  for  example,  from  the 
formula  xk  = ( — 1 ) k f — x ) — if  we  negate  both  x and  y. 

5.38  Choose  c as  large  as  possible  such  that  (3)  $ n.  Then  0 ^n  -Q)< 

(?3  ) ™ (3)  = (2);  replace  n by  n (3)  and  continue  in  the  same  fashion. 

Conversely,  any  such  representation  is  obtained  in  this  way.  (We  can  do  the 
same  thing  with 


n 


0 ^ Qi  < CD  < • ■ ■ < a, 


The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  true. 


for  any  fixed  m.) 

5.39  xmyrL  = Hit  (m+nn:,’  k)  anbm  kxk  + Lk=i  ("TTV  kbmyk  f or 
all  mn  > 0,  by  induction  on  m + n. 


5.40  (_1  tm+ivn 

/m — rk-s-  1 


It.,  £,",«)(" 


m rk  s-l\  —1  1 im+1  V~  n ((m  i|k  1|  s-1' 

m-j  I T '>  Z_k=m  m 

))  = Mr+,(r rrr"))  = rm+s)-a). 


5A1  Lk>on!/<n  “k)!  (n  + k+  1)!  = (n!/(2n+ l)!)£k>n  (2nk+1),  which  is 

22un!/(2n  + l )!. 


5.42  We  treat  n as  an  indeterminate  real  variable.  Gosper’s  method  with 
q(k)  = k + 1 and  r(k)  = k — 1 -n  has  the  solution  s(k)  = 1 / ( n + 2);  hence 
the  desired  indefinite  sum  is  (-1  )%r 1 And 


Li-.: 


k=0 


= (-1 


\x  1 


n+1  //n+1 

V x 


n + 1 


.n  + 1 

= 2 (n  even] , 

n + 2 


This  exercise,  incidentally,  implies  the  formula 

I 1 1 


n - I 

k 


(n+1] 


n 

k+1 


+ (n+1] 


a “dual”  to  the  basic  recurrence  (5.8). 

5.43  After  the  hinted  first  step  we  can  apply  (5.21)  and  sum  on  k.  Then 
(5.21)  applies  again  and  Vandermonde’s  convolution  finishes  the  job.  (A  com- 
binatorial proof  of  this  identity  has  been  given  by  Andrews  [10].  There’s  a 
quick  way  to  go  from  this  identity  to  a proof  of  (5.29),  explained  in  [173, 
exercise  1.2.6-62].) 
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5.44  Cancellation  of  factorials  shows  that 


/ m\  /n\  /m.  + n\  _ /m  + n — j — k\/j+k\/m  + n\ 

Vi/  WV  tti  J { m - j Jv  j y V j + lc  / 


so  the  second  sum  is  1 /( m^n)  times  the  first.  We  can  show  that  the  first  sum 
is  (aQb)  (m~m  b b)’  whenever  n b,  even  if  m < a:  Let  a and  b be  fixed 
and  call  the  first  sum  S(  m,  T).  Identity  (5.32)  covers  the  case  n = b,  and 
we  have  S(m,n)  = S(m,u  1)  + S(m-  l,n)  + ( — i )m+n(m^n)  (^)  (^)  since 
= (m+nml^  k)  + (m_rn-?-j_k)1  The  resuh  follows  by  induction  on 
m+  n,  since  = & when  n > b and  the  case  m = 0 is  trivial.  By  symmetry, 
the  formula  (a^b)  b)  holds  whenever  m y a,  even  if  n < b. 

5.45  According  to  (5.9),  X.<<n  C k^2)  = (n+n22)-  ^ this  f°rm  isn’t  “closed” 
enough,  we  can  apply  (5.35)  and  get  (2n  + 1)  (2y)4~n. 

5.46  By  (5.69),  this  convolution  is  the  negative  of  the  coefficient  of  z2n 
in  2_i(z)!B  1 (— z).  Now  (2®_1(z)  - l)(2(B_i(— z)  - 1)  = \/1  - 16z2;  hence 
TS_i  (z)®_i  (— z)  = \\f\  ~16z2  + j'B  ] (z)  + j®  _i  (— z)  - j.  By  the  binomial 
theorem. 


(1  „ 16z2)1/2  = Y_  (1(2)(-'16)nz2n 

n ' ' 


4nz2n 

2n^T 


The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  false. 


so  the  answer  is  (2IJl)4n  1/(2n  1)  + 

5.47  It’s  the  coefficient  of  zn  in  (,Br(z)s/Qr(z))  (,Br(z)~s/Qr(z))  = 1/Qr(z)2, 
where  Qr(z)=l  — r + r'Br(zj  1 , by  (5.61). 

5.48  F(2n  + 2,1;  n + 2;  1)  z:  22n+1/(2Jljf1') , a special  case  of  (5.111). 

5.49  Saalschiitz’s  identity  (5.97)  yields 


/x  + n\  y p/-x,  -n,  -n-y  A (y-x)n 
V n Jy  + n \-x-n,  1-n-y  J ~ (y  + l)* 

5.50  The  left-hand  side  is 


y Q h y 

2—  . Z_ 


k! 


m^O 


k + a + m - 1 


= L*°L 


k uk 


aKb 
ck  k! 


(-I! 


n^O  k^O 

and  the  coefficient  of  zn  is 


n+a  — 1 V f a,b,  — n 


n + a — 1 
n-k 


n 


c,  a 


a"  = (c-_b)T 


n! 
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by  Vandermonde’s  convolution  (5.92). 

5.51  (a)  Reflection  gives  F(a,  — n;  2a;  2)  = (-1  )nF(  a,  — n;  2a;  2).  (Inciden- 

tally, this  formula  implies  the  remarkable  identity  A2m+’  f(0)  = 0,  when 
f(n)  = 2nxV(2x)^. ) 

(b)  The  term-by-term  limit  is  X.o<k<m  Ck)  '’m+l’  k ( — 2)k  plus  an  addi- 
tional term  for  k = 2m  — 1:  the  additional  term  is 


(-m)„.  (-1)  (l)...(m)  (—2m  + 1) 


)22 


m+l 


(-1 


2m) . (-1)  (2m 
m!  rn!22m+1 


D! 


,m+l 


(2m)! 


m / 


hence,  by  (5.104),  this  limit  is  — 1/(  ^ , the  negative  of  what  we  had. 

5.52  The  terms  of  both  series  are  zero  for  k > N.  This  identity  corresponds 
to  replacing  k by  N — k.  Notice  that 


aN  = aN~k  (a  + N — k)k 

= aN  k ( a + IN  — 1 )-  = a^Ml -a-N)k(-1)k. 

5.53  When  b — — j,  the  left  side  of  (5.110)  is  1 22  and  the  right  side  is 

( 1 — 4z  + 4z2)1/2,  independent  of  a.  The  right  side  is  the  formal  power  series 

1 + (y^z-n-F  (^2)l6z2(z-l)2  + .-  - , 


which  can  be  expanded  and  rearranged  to  give  1 2z  + Oz2  + Oz3  + • ; but  the 
rearrangement  involves  divergent  series  in  its  intermediate  steps  when  z = 1 , 
so  it  is  not  legitimate. 

5.54  If  m + n is  odd,  say  2N  1,  we  want  to  show  that 


lim  F 

e— >0 


/N  — m—  j,  — N + e 
V -m+e 


0, 


Equation  (5.92)  applies,  since  -m  + e > -m  — j + e,  and  the  denominator 
factor  r(c-b)=  T(N— m)  is  infinite  since  N <C  m;  the  other  factors  are  finite. 
Otherwise  m + n is  even;  setting  n = m 2N  we  have 


lim  F 

e— >0 


/— N,  N-m-^  + e 
V -m+e 


(N  - 1/2)- 


by  (5-93)-  The  remaining  job  is  to  show  that 

( m \(N  1/2)!  (m-N)!  _ /m-N\  2N 

\m  — 2Ny  ( — 1/2)!  m!  — \m  — INJ" 
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and  this  is  the  case  x = N of  exercise  22. 

5.55  Let  Q(k)  = (k  + A, ) . . . (k  + AM)Z  and  R(k)  = (k  + B, ) . . . (k  + BN ). 
Then  t(k+  l)/t(k)  = P(k)Q(k—  1)/P(k—  1)R(k),  where  P(k)  = Q(kJ  -R(k) 
is  a nonzero  polynomial. 

5.56  The  solution  to  — (k+1  )(k  + 2)  = s(k+ 1 ) + s(k)  is  s(k)  - -lk2-k-i; 
hence  J_  ( k3)  6k  = g (-1  )k_1  (2k2  + 4k  + 1)  + C.  Also 


(-1 


\k-1 


k+1 

k + 2 

2 

L 2 

4 

(-I)"-1 


k+  1 


l-L(-l)1' 


k + 2 - 


1-MI 


g (2k2+4k+l)  + - 


5.57  We  have  t(k+1  )/t(k)  = (k-n)(k+l  +0)(-z)/(k+l  )(k+0).  Therefore 
we  let  p(k)  = k+  0,  q(k)  = (k-  n)(— z),  r(k)  = k.  The  secret  function  s(k) 
must  be  a constant  <xq,  and  we  have 


k + 0 = (-z(k  — n)  — k)  ao  ; 


hence  do  = — 1/(1  + z)  and  0 = — nz/fl  + z).  The  sum  is 


n fn—  1 

TTzVk-i 


zk  + C 


(The  special  case  z = 1 was  mentioned  in  (5.18);  the  general  case  is  equivalent 

t0  (5*131).) 

5.58  If  m >0  we  can  replace  (tk)  by  2 (1^+11)  and  derive  the  formula  Ttn.n  — ■ 
^■Tm_i  n_]  2-  (n~1).  The  summation  factor  1 is  therefore  appropriate: 


Tm.-  1 ,n-  1 11 

m + n 

Vm  1/ 


We  can  unfold  this  to  get 


1~m,n  _ T 

- ‘o.n  m 


Hm  + 


Hn  - H 


n— m • 


Finally  To  n m = Hn_  m,  so  Tm  n = (^)  (H,  — Hm).  (It’s  also  possible  to  derive 
this  result  by  using  generating  functions;  see  Example  2 in  Section  7.5.) 


5-59  (j)[KLlogmkJ]  = Z^o.k^i  (j)[m’^k<m>+1]1  which  is 

Lj^o  (?)(mi+1  “ mJ)  = (m“  1 ) (^)mi  = (m-  1)(m+  1)n. 
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The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  not  a 
sentence. 


5.60  (2p)  ~ 4n/\/7m  is  the  case  m = n of 


5.61  Let  |_n/pj  = q and  n mod  p = r.  The  polynomial  identity  (x  +1  )p  = 
xp  + 1 (mod  p)  implies  that 

(x+1)p«+r  = (x  + l)r(xp  +1)p  (mod  p). 

The  coefficient  of  xm  on  the  left  is  (^).  On  the  right  it’s  ]T  k (m/pk)  (k),  which 
isJust  U Jodp)(Lm/pj)because  0 6r<p. 

5-62  (;pp)  = Lk1+-+k„=mP  (p)  . . . (kp)  = (P)  (mod  p2),  because  all  terms 

of  the  sum  are  multiples  of  p2  except  the  terms  in  which  exactly  m of  the 
k’s  are  equal  to  p.  ((Stanley  [275,  exercise  1.6(d)]  shows  that  the  congruence 
actually  holds  modulo  p3  when  p > 3.) 

5.63  This  is  Sn  = [Tk^0(-4)k(p/k)  = Hk=o(-4)n~k(2nk  k)-  The  denomina- 
tor of  (5.74)  is  zero  when  z = —1/4,  so  we  can’t  simply  plug  into  that  formula. 
The  recurrence  Sn  = — 2Sn_i  — Sn_2  leads  to  the  solution  Sn  = ( — 1 )n(2n-|- 1 ). 

5-64  Lk^o((2nk)  + (2k+i))/(k+  1)  = Lk^O  (2nk+1l)/(k+  ])>  which  is 

2 

n + 2 

5.65  Multiply  both  sides  by  rtn_1  and  replace  k by  n — 1 — k to  get 


L 

kjsO 


ri  + 2 


2n+2  _ 2 


2k  + 2/  = n + 2 


/ _ 1 \ n+t 

£(  j nk(n  — k)!  = (n  l)!^(nk+1/k!  uk/(k~1.)!) 

k k K ' 

= (n-1)!nn/(n-l)!. 


(The  partial  sums  can,  in  fact,  be  found  by  Gosper’s  algorithm.)  Alternatively, 
(£)knn_1_,ck!  can  be  interpreted  as  the  number  of  mappings  of  {1 , . . . , n}  into 

itself  with  f (1) f(k)distinctbutf(k+l)  ® {f(l),...,f(k)};summingonk 

must  give  nn. 

5.66  This  is  a “walk  the  garden  path”  problem  where  there’s  only  one  “ob- 
vious” way  to  proceed  at  every  step.  First  replace  k — j by  l,  then  replace 
[%/Ij  by  k,  getting 


Z , 


,k£0 


-1 


2k  + 1 
2> 
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The  infinite  series  converges  because  the  terms  for  fixed  j are  dominated  by 
a polynomial  in  j divided  by  2).  Now  sum  over  k,  getting 


L 


i±I 

2> 


Absorb  the  j + 1 and  apply  (5.57)  to  get  the  answer,  4(m+  1). 
5.67  3(2nn+52)  by  (5.26),  because 


5.68  Using  the  fact  that 

2n  ^ 


L 

k$n/2 


+ 


n 

n/2 


[ri  is  even]  , 


we  get  n(2n  1 - 

5.69  Since  (k21)  + (l)')  (2)  + (2)  k < 1>  the  minimum  occurs 

when  the  k’s  are  as  equal  as  possible.  Hence,  by  the  equipartition  formula  of 
Chapter  3,  the  minimum  is 


(n  mod  m) 


(n  — (n  mod  m)) 


^ Ln/mj 


The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  not  boxed. 


= n 


+ 


(n  mod  m 


n 

m 


A similar  result  holds  for  any  lower  index  in  place  of  2. 

5.70  This  is  F(-n,  1;  1;2);  but  it’s  also  (— 2)_n(2T[l)F(— n,  -n;  \ -n;  1)  if  we 

replacekbyn-k.  Now  F(— ti, -n;  \ -n;  j)  = F(- j, -j;  \ -n;  1 ) by  Gauss’s 
identity  (5.  ill).  (Alternatively,  F(-n,-n;  i-ti;  5)  = 2~nF(—  n,  j— n;  -1) 

by  the  reflection  law  (5.101),  and  Kummer’s  formula  (5.94)  relates  this  to 
(5-55)-)  The  answer  is  0 when  n is  odd,  2~n(n’)2)  when  n is  even.  (See  [134, 
§1.2]  for  another  derivation.  This  sum  arises  in  the  study  of  a simple  search 
algorithm  [ 164].) 

5.71  (a)  S(z)  = £^0  akzm+k/(l -z)m+2k+1=  zm(1  -z)  -z)2). 

(b)  Here  A ( z)  = £k^o  (2kk) (— z)k/(k  + 1)  = (\/l  + 4z  — l)/2z,  so  we  have 
A(z/(1  -z)2)=  1 -z.  Thus  Sn  = [zn]  (z/(l  - z))m=  (^). 

5.72  The  stated  quantity  is  m(m  — n)  . . . (m  — (k  — 1 )n)nk~^kVk!.  Any 
prime  divisor  p of  tl  divides  the  numerator  at  least  k — -v  ( k.)  times  and  di- 
vides the  denominator  at  most  k — v(k)  times,  since  this  is  the  number  of 
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times  2 divides  k!.  A prime  p that  does  not  divide  n must  divide  the  prod- 
uct m(m  — n) . . . (m  — (k  — 1 )t)  ht  eas  as  often  as  it  divides  k!,  because 
m(m  — n) (m  — (pr  — 1 )i)lS  a multiple  of  pT  for  all  r ^ 1 and  all  m. 

5.73  Plugging  in  Xn  = n!  yields  <x  = |3  = 1;  plugging  in  Xn  = nj  yields 
a = 1,  (3  = 0.  Therefore  the  general  solution  is  Xn  = cxnj  + |3(n!  rtj ) . 

5'74  ^k^n- 

5.75  The  recurrence  Sk(n+1 ) = Sk(n)  +(k  w mod  3 (n)  makes  it  possible  to 
verify  inductively  that  two  of  the  S’s  are  equal  and  that  S(_n]  mod3(Tl)  differs 
from  them  by  (-1)“.  These  three  values  split  their  sum  So(n)  + Si  (n)  + 
S2(n)  = 2n  as  equally  as  possible,  so  there  must  be  2”  mod  3 occurrences  of 
|"2n/3]  and  3 (2”  mod  3)  occurrences  of  [2^/3]. 

5.T6  Qn,k  = (n+1  )(k) 

5.77  The  terms  are  zero  unless  ki  ^ ^ km,  when  the  product  is  the 

multinomial  coefficient 

km  \ 

ki , k2  - k] , . . . , km  — km_i  / 

Therefore  the  sum  over  ki , . . . , km_i  is  Tnkm  • and  the  final  sum  over  km  yields 
(Tna+1  - 1 ) / ( m - 1). 

5.78  Extend  the  sum  to  k = 2m2  T m — 1;  the  new  terms  are  Q)  -(-  + 

• ••-(-  ("-’)  = 0.  Since  m _L  (2m  + 1),  the  pairs  (kmod  m.kmod  (2m  + 1)) 
are  distinct.  Furthermore,  the  numbers  (2j  -f  1)  mod  (2m  + 1)  as  j varies  from 
0 to  2m  are  the  numbers  0,  1 , . . . , 2m  in  some  order.  Hence  the  sum  is 

E 

O^kcm 
0^j<2m4-1 

5.79  (a)  The  sum  is  22n_1 , so  the  gcd  must  be  a power  of  2.  If  n = 2kq  where 
q is  odd,  (“1n)  is  divisible  by  2k+1  and  not  by  2k+2.  Each  (2f+i)  is  divisible 
by  2k+1  (see  exercise  36),  so  this  must  be  the  gcd.  (b)  If  pr  ^ n + 1 < pr+1, 
we  get  the  most  radix  p carries  by  adding  k to  ri  k when  k = pT  1.  The 
number  of  carries  in  this  case  is  r — ep(n+  1),  and  t=  ep(L(n  + 1)). 

5.80  First  prove  by  induction  that  k!  (k/e)k. 

5.81  Let  fi,m  n(x)  be  the  left-hand  side.  It  is  sufficient  to  show  that  we  have 
fi,m,n(1)  > 0 and  tlhat  f/mn(x)  < 0 for  0 $x  <C  1.  The  value  of  fi,m,n(1) 
is  (— 1 )n_m_1  (l+l^0)  by  (5.23),  and  this  is  positive  because  the  binomial 
coefficient  has  exactly  n — m-  1 negative  factors.  The  inequality  is  true  when 
1=0,  for  the  same  reason.  If  l > 0,  we  have  mn(x)  = — lf;-i  m,n+l(x)i 
which  is  negative  by  induction. 
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5.82  Let  ep(a)  be  the  exponent  by  which  the  prime  p divides  a,  and  let 
m = n — k.  The  identity  to  be  proved  reduces  to 


min(ep(m)— ep(m+k),  ep(m+k+l  )-ep(k+1 ),  ep(k)-ep(m+1 )) 

= min(ep(k)— ep(m.+k),ep(m)-ep(k+1),ep(m+k+1)-ep(Tn+1))  . 

For  brevity  let’s  write  this  as  min(xi , yi , Zi ) = min(x2,  y 2 > 2^2 ) • Notice  that 
X]  + yi  + Z]=  X2  + y2  + Z2-  The  general  relation 

ep(a)  < ep(b)  =4  ep(a)  = ep(ja±b|) 


allows  us  to  conclude  that  x.l  / X2  =4  min(xi , X2)  = 0;  the  same  holds  also 
for  (yi,  -y2 ) and  (zi  , Z2).  It’s  now  a simple  matter  to  complete  the  proof. 

5.83  If  m < n,  the  quantity  (m+™lj_k)  is  a polynomial  in  k of  degree 

less  than  n,  for  each  fixed  j;  hence  the  sum  over  k is  zero.  If  m i>  n and 
if  T is  an  integer  in  the  range  n < r <C  m,  the  quantity  (’^k)  (m+T^  ^ k)  is  a 
polynomial  in  j of  degree  less  than  r,  for  each  fixed  k;  hence  the  sum  over  j is 
zero.  If  m i>  n and  if  r = -d  ™ 1 is  an  integer,  for  0 <C  d < n,  we  have 


(-1  yY. 


hence  the  given  sum  can  be  written 


It 


n\  /m  + n --  j - k\ 
k ) V TTi-  j ) 

erx^rm 


— I — k — 1 

i-i 

-l  - n - 2 
m — 1 


/-n  + k- 

V m~i 


This  is  zero  since  (l|k)  is  a polynomial  in  k of  degree  d < n. 

If  m n,  we  have  verified  the  identity  for  m different  values  of  r.  We 
need  consider  only  one  more  case  to  prove  it  in  general.  Let  r = 0;  then  j = 0 
and  the  sum  is 


^M)k 


Art  + u - 

V ttl. 


by  (5.25).  (Is  there  a substantially  shorter  proof?) 
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5.84  Following  the  hint,  we  get 


ziBtur’Baz)  = X 

kao 


/ tk  + r\  kzk 
V k ) tk  + r ’ 


and  a similar  formula  for  £t(z).  Thustheformulas  (ztjB,)1  (z))Bt'(z)  + 1)'Bt(z)r 
and  (zt£j" 1 (z)£{  (z)  + l)&,(z)’  give  the  respective  right-hand  sides  of  (5.61). 
We  must  therefore  prove  that 


(zt'B~,(z)'B;(z!  + l)‘Bt(z)r 


1 

1 - t 4 tS  tz)  1 ' 


(zt£-1(z)£;(z)  + l)£t(2p  = 


1 

— zt£(z )*  ' 


The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  self- 
referential 


and  these  follow  from  (5.59). 

5.85  If  f(x)  = anXn  + > ■ ■ + Qix  + ao  is  any  polynomial  of  degree  <C  n,  we 
can  prove  inductively  that 

Y (-1  )C|+'"+£:nf(eixi  H + enxn)  = (-1)nn!anxi.  . . xn  . 

en'4) 

The  stated  identity  is  the  special  case  where  a,  = 1 /rt!  and  xk  = k3. 

5.86  (a)  First  expand  with  n(n-  1)  index  variables  fe  for  all  i ^ j.  Setting 

kij  = Ijj  — Ij;  for  1 .<;  i < j < n and  using  the  constraints  (lij  — Iji)  = 0 for 
all  r < n allows  us  to  carry  out  the  sums  on  ljn  for  1 < n and  then  on  1^ 

for  1 ^ i < j < n by  Vandermonde’s  convolution,  (b)  f(z)  — 1 is  a polynomial 
of  degree  < n that  has  n roots,  so  it  must  be  zero,  (c)  Consider  the  constant 
terms  in 


n ' 


= i n 

k=1  1 


(i=k) 


5.87  The  first  term  is  (nkk)zmk,  by  (5.61).  The  summands  in  the  second 
term  are 


k^O 


(n+  l)/m  + (1+l/m)kV  )k+n+1 


k>n 


'1+1/m)k  — n — 1 
k-n-  1 


(Cz)k. 
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Since  ^.o<j<m(^2'  + 1 = ttl(  — 1 )l[k  ml],  these  terms  sum  to 


L 

k > r / m 


(1+1/m)mk  it 
m k - n - - 1 


k 


(m+1  )k  — n ~1  j 
k 


zmk 


Incidentally,  the  functions  'Bm(zm)and  C2’  + ,2'Bi  + )/mjC2’+12.)1^m  are  the  m+1 
complex  roots  of  the  equation  wm+'  — wm  = zm. 

5.88  Use  the  facts  that  Jp°(e  — e nt)  dt/t  = lntl  and  (1  e_t)/t  ^ 1. 

(We  have  (*)  = 0(k~x  1 ) a;S  k — > oo,  by  (5.83);  so  this  bound  implies  that 
Stirling’s  series  Sk  (k)  converges  when  x > ■ 1.  Hermite  [155]  showed  that 
the  sum  is  In T(  1 + x) . ) 


5.89  Adding  this  to  (5.19)  gives  y r(x+y)m+Ton  both  sides,  by  the  binomial 
theorem.  Differentiation  gives 


L 

k>  m 


m + r\  /m  — k 


n 


xkym  k n 


L 

k>m 


— r\  /m  — k 


n 


( — x)k  (x  + y) 


m k n 


and  we  can  replace  k by  k + m + 1 and  apply  (5.15)  to  get 


The  boxed 
sentence 
on  the 
other  side 
of  this  page 
is  not  self- 
referential. 


— n — 1 
k 


(—  x)m+1+ky 


-k— n 


-n-  1 


,m+ 1 +k 


(x  + y 


-k— n 


In  hypergeometric  form,  this  reduces  to 


( 1 — r,  n + 1 — x\ 

V ^+2  ~VJ 


x 

H 

y 


-n-  1 


m+1  +r,  n + 1 
m + 2 


x + y 


which  is  the  special  case  (a,  b,  c,  z)  = (n  + 1 , m + 1 + T,  m + 2,  -x/y)  of  the 
reflection  law  (5.101).  (Thus  (5.105)  is  related  to  reflection  and  to  the  formula 
in  exercise  52.) 

5.90  If  r is  a nonnegative  integer,  the  sum  is  finite,  and  the  derivation  in 
the  text  is  valid  as  long  as  none  of  the  terms  of  the  sum  for  0 <t  k r has 
zero  in  the  denominator.  Otherwise  the  sum  is  infinite,  and  the  kth  term 
(k  k ^ k k ')  is  approximately  ks  r ( - s — !)!/(— r — 1)!  by  (5.83).  So  we 
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Burma- 

Shave 


need  r > s+  1 if  the  infinite  series  is  going  to  converge.  (If  r and  s are  complex, 
the  condition  is  (Hr  > iHs  + 1,  because  |kz|=  k^2 . ) The  sum  is 


r(r-s-l)n-s)  _ s +1 

r(r-s)n-s-l)  “ s + I ■ r 


by  (5.92);  this  is  the  same  formula  we  found  when  r and  s were  integers. 

5.91  (It’s  best  to  use  a program  like  MACSYMA  for  this.)  Incidentally, 
when  c = (a+  1 )/2,  this  reduces  to  an  identity  that’s  equivalent  to  (5.110),  in 
view  of  the  Pfaff  s reflection  law.  For  if  w = — z/(  1 — 2.1  we  have  4w(  1 w)  = 
— 4z/(  1 - z)2,  and 


f ( 2Q|  ia  + ;-b 
\ 1 +a-b 


4w(  1 — w 


/ a,  a + 1 —2b 
\ 1+a-b 


= (1 


z)u  F 


a,b  \ 

1+a-b  J 


5.92  The  identities  can  be  proved,  as  Clausen  proved  them  more  than  150 
years  ago,  by  showing  that  both  sides  satisfy  the  same  differential  equation. 
One  way  to  write  the  resulting  equations  between  coefficients  of  zn  is  in  terms 
of  binomial  coefficients: 


r OMAHA)  = A a 

r r\,/2)(,+;  ;/2)  mcv'2) 


L 


(Tk  'A’havhav’) 

( T’K  TV’) 

_ (A2)r/2A‘)(  1,2  A 


( 


-1  + r+sl 


n S)(  n S) 
Another  way  is  in  terms  of  hypergeometrics: 


/ a,b,2-a-b-n,-n  1 \ (2a)n  (a  + b)n  (2b)n  _ 

Vj  + a+b,  1 — a— n,  1 — b— n I / (2a  + 2b)n  an  bn 

( $ + a,  l fb,a+b-n,-n  \ 

Vl+a  + b,|  + a-n,|+b-n  ) 

(1/2)""  (1/2  + a — b) 27  (1/2  — a + b)^ 
(1  + a + b)n  (1/4  - a)n  (1/4  — b)n 


5.93  (j  ) -)-  a)/f(j).  (The  special  case  when  f is  a polynomial  of 

degree  2 is  equivalent  to  identity  (5.133).) 
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5.94  This  is  a consequence  of  Henrici’s  “friendly  monster”  identity, 


f ( a,  z)  f ( a,  wz)  f ( a,  ukz) 


= F 


w> 


2^+1 


— CL  — Q.+  --  — Cl  T — — CL  — — — Cl  — Q.  — Q 
3 u’  3 u ' 3 1 3 u ' 3 ’ 3 u 3 > 3 u>  3 u ' 3 > u 


where  f (a,  z)  = F(;  a;  z).  This  identity  can  be  proved  by  showing  that  both 
sides  satisfy  the  same  differential  equation.  If  we  replace  3n  by  3n  + 1 or 
3n  + 2,  the  given  sum  is  zero. 

5.95  See  [78]  for  partial  results.  The  computer  experiments  were  done  by 
V.  A.  Vyssotsky. 

5.96  All  large  n have  the  property,  according  to  Sarkozy  [ 256'].  Paul  Erdos 
conjectures  that,  in  fact,  maxp  £p  ((2,“))  tends  to  infinity  as  n — » oo. 

5.97  The  congruence  surely  holds  if  2n  + 1 is  prime.  Steven  Skiena  has  also 
found  the  example  n = 2953,  when  2n  + 1 = 3 • 1 1 • 1 79. 

6.1  2314,  2431,  3241,  1342,  3124,  4132,  4213,  1423,  2143,  3412,  4321. 

6.2  { ^}tu-,  because  every  such  function  partitions  its  domain  into  k non- 
empty subsets,  and  there  are  m-  ways  to  assign  function  values  for  each 
partition.  (Summing  over  k gives  a combinatorial  proof  of  (6.io).) 

6.3  Now  dk+i  <C  (center  of  gravity)  — e = 1 — e + (di  + • • • + dk)/k-  This 

recurrence  is  like  (6.55)  but  with  1 £ in  place  of  1;  hence  the  optimum 

solution  is  dk+i  = (1  — e)Hk.  This  is  unbounded  as  long  as  £ < 1. 

6.4  H2n+1  " (Similarly  (-1)k-'/k  = H2n  H„) 

6.5  Un(x,y)  is  equal  to 

(k)(-1)k_lk  '(x  + ky)-1  +L)^i  (k)(“1]k_1  (x  + ky)n_1  , 

and  the  first  sum  is  Uu_i(x,y)  + (kli) (“1  )k~'  (x  + ky)’1”1.  The 
remaining  k_1  can  be  absorbed,  and  we  have  2Ik>1  (£)  (—1  )k_1  (x  + ky  )n_1  = 
xn^+Lk^0  (k) (; — 1 )k  1 (x+ky)n_1  ^x"”1.  This  proves  (6.75).  Let  Rn(x,y)  = 
x_nUu(x, y);  then  R0(x,y]  = 0 and  Rn(x,y)  = Ru-i(x,y)  + 1/n  + y/x,  hence 
Rn(x,y)  = Hn+rty/x.  (Incidentally,  the  original  sum  Un  = lin(ri,  - 1 ) doesn’t 
lead  to  a recurrence  such  as  this;  therefore  the  more  general  sum,  which  de- 
taches x from  its  dependence  on  n,  is  easier  to  solve  inductively  than  its 
special  case.  This  is  another  instructive  example  where  a strong  induction 
hypothesis  makes  the  difference  between  success  and  failure.) 

6.6  Each  pair  of  babies  bb  present  at  the  end  of  a month  becomes  a pair 
of  adults  aa  at  the  end  of  the  next  month;  and  each  pair  aa  becomes  an 


Han  Vardi  notes 
that  the  condi- 
tion holds  for 
2n  + 1 = p2, 
where  p is  prime, 
if  and  only  if 

2p~’  mod  p2  = 1. 

This  yields  two 
more  examples: 

n = (10932-1)/2; 
n = (351 1 2 — 1 ) /2 . 


The  Fibonacci  re- 
currence is  additive, 
but  the  rabbits  are 
multiplying. 
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If  the  harmonic 
numbers  are  worn 
numbers,  the  Fi- 
bonacci numbers 
are  rabbit  numbers. 


aa  and  a bb.  Thus  each  bt  behaves  like  a drone  in  the  bee  tree  and  each  aa 
behaves  like  a queen,  except  that  the  bee  tree  goes  backward  in  time  while 
the  rabbits  are  going  forward.  There  are  Fn+i  pairs  of  rabbits  after  n months; 
Fn  of  them  are  adults  and  Fn_j  are  babies.  (This  is  the  context  in  which 
Fibonacci  originally  introduced  his  numbers.) 

6.7  (a)  Set  k = 1 — n and  apply  (6.107).  (b)  Set  m = 1 and  k = n-  1 and 
apply  (6.128). 

6.8  55  + 8 + 2 becomes  89  + 13  -\-  3 = 105;  the  true  value  is  104.607361. 

6.9  21.  (We  go  from  Fn  to  Fn+2  when  the  units  are  squared.  The  true 
answer  is  about  20.72.) 


6.10  The  partial  quotients  do,  al,  Q2,  ...  are  all  equal  to  1,  because  ()>  = 

1 + 1/4).  (The  Stern-Brocot  representation  is  therefore  RLRLRLRLRL..  . .) 

6.11  (-1)”  = [n  = 0]  - [n  = 1];  see  (6.11). 

6.12  This  is  a consequence  of  (6.31)  and  its  dual  in  Table  250 

6.13  The  two  formulas  are  equivalent,  by  exercise  12.  We  can  use  induction. 
Or  we  can  observe  that  znDn  applied  to  f(z)  = zx  gives  x— Zx  while  §n  applied 
to  the  same  function  gives  xnzx;  therefore  the  sequence  (-8°,  4’  ,82,.  . . ) must 
relate  to  (z0D°,z’D' , Z2D2, . . . ) as  (x°,  X1  ,X2, . ) relates  to  (x-,  x-,  x-, . ■ . ). 

6.14  We  have 


x 


x + k' 
n 


(k+1) 


x + k 
n+1 


+ (n  — k) 


/x  + k+1\ 
V n + 1 ) 


because  (n  + l)x=  (k  + l)(x  + k-n)  + (n-k)(x  + k + l).  (It  suffices  to  verify 
the  latter  identity  when  k = 0,  k = -1,  and  k = n.) 

6.15  Since  = (x^k),  we  have  the  general  formula 


Am(xn)  = 


+ j)” 


Set  x = 0 and  appeal  to  (6.19). 

6.16  An,k  — this  sum  is  always  finite. 

6-17  (a)  1*1=  [n*|\].  (b)|*|=  ni^  = n!  [n£  k]/k!.(c)  |*|=  k!{*}. 

6.18  This  is  equivalent  to  (6.3)  or  (6.8).  (It  follows  in  particular  that 
o,(l)  = — no+fO)  = Bn/n!  when  n > 1.) 

6.19  Use  Table  258. 

6-20  Ll$i^k$nVj2  = Il<cKn(n+  1 “ i)/l2  = (n  + 1)Hh!  H„ 
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6.21  The  hinted  number  is  a sum  of  fractions  with  odd  denominators,  so 
it  has  the  form  a/b  with  a and  b odd.  (Incidentally,  Bertrand’s  postulate 
implies  that  bn  is  also  divisible  by  at  least  one  odd  prime,  whenever  n > 2.) 

6.22  |z/k(k  + z)|  ^ 2|z|/kxvhen  k > 2[z|,  so  the  sum  is  well  defined  when 

the  denominators  are  not  zero.  If  z = n we  have  HLiO/k  1/(lc  + n))  = 

Hm  — Hm+n  + Hn,  which  approaches  Hn  as  m — > oo.  (The  quantity  Hz_i  — y 
is  often  called  the  psi  function  t)>(z).) 

6.23  z/(ez  + 1 ) = z/(ez  - 1 ) - 2z/(e2z  -!}•,=  £n;s0(1  - 2Tl)Btlz7n!. 

6.24  When  n is  odd,  T„(x)  is  a polynomial  in  x2,  hence  its  coefficients 
are  multiplied  by  even  numbers  when  we  form  the  derivative  and  compute 
Tn+i  (x)  by  (6.95).  (In  fact  we  can  prove  more:  The  Bernoulli  number  Bin 
always  has  2 to  the  first  power  in  its  denominator,  by  exercise  54;  hence 
22n  k \\T2n+i  <(=^  2k\\(n  + 1).  The  odd  positive  integers  (n  + 1 )T2n+ 1 /22n 
are  called  Genocchi  numbers  (1,  1,3,17,155,2073,.  . . },  after  Genocchi  [117].) 

6.25  100n  - nHn  < 100(n-  1)  (n-  1)Hn-i  Hn-1  > 99.  (The  least 

such  n is  approximately  e" _ while  he  finishes  at  N ft  e100~  7,  about  e times 
as  long.  So  he  is  getting  closer  during  the  final  63%  of  his  journey.) 

6.26  Let  u(k)  = Hk-1  and  Av(k)  = 1/k,  so  that  u(k)  = v(k).  Then  we  have 

Sn  - Hn  1 = EL,  Hk  i/k  = H2_; - Sn  = H2  - Sn. 

6.27  Observe  that  when  m > n we  have  gcd(Fm,Fn)  = gcd(Fm  n,Fn)  by 
(6.108).  This  yields  a proof  by  induction. 

6.28  (a)  Qn  = <x(Ln  Fn)/2+  |3Fn.  (The  solution  can  also  be  written 
Qn=  <xFn  1 + (3Fn.)  (b)  Ln  = <t>n  + $7 

6.29  When  k = 0 the  identity  is  (6.133).  When  k = 1 it  is,  essentially, 

K (x i , . . . , x,)x,  — K(xi , . . . ,xm]k(xm,-  • • ,xn] 

“ K(xi  ,.  . . , Xm-2)  K(xm+2>  ■ . . ,xn  ) , 

in  Morse  code  terms,  the  second  product  on  the  right  subtracts  out  the  cases 
where  the  first  product  has  intersecting  dashes.  When  k > 1,  an  induction 
on  k suffices,  using  both  (6.127)  and  (6.132).  (The  identity  is  also  true  when 
one  or  more  of  the  subscripts  on  K become  -1,  if  we  adopt  the  convention  that 
K | = 0.  When  multiplication  is  not  commutative,  Euler’s  identity  remains 
valid  if  we  write  it  in  the  form 

> • • • , XCm-(-n)  Lkt^m+k)  . . . , Xm-t-i  ) 

= km+k(Mi.  . Xm^k)  Kn(xm-|-n,  . . . , Xm+1 ) 

+ ( 1 (X-I  , . . . , Xm_i  ) Kn_k— , (Xm+ri)  • • . , Xm^k+2  ) • 
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For  example,  we  obtain  the  somewhat  surprising  noncommutative  factoriza- 
tions 

(abc  + a + c)(1  + ba)  = (ab  + 1 )(cba  + a + c) 
from  the  case  k = 2,  m = 0,  n = 3.) 

6.30  The  derivative  of  K(xi  , . . . ,Xn)  with  respect  to  xm  is 
K(xi , . . . , xm_ i ) K(xm+i , . . . , xn ) , 
and  the  second  derivative  is  zero;  hence  the  answer  is 


K(X] , . . . , Xn)  — Kf  X; , Xm_] ) KfXm+i , . . . , xn)  y 


6.31  Since  xn  = (x  + n - 1)-  = (k)x-(n  - i JILdi,  we  have  |£| 

(k)(n  — 1 )— These  coefficients,  incidentally,  satisfy  the 


recurrence 


= (n  - 1 + k) 


n-  1 

k 


n — 1 
k-1 


integers  n,  k > 0. 


6.32  Lk<;mk{n^  }=  {m+^+  } and  £0a$n  {m}(m+1)nk=  {m+i}>  both 
of  which  appear  in  Table  251. 

6.33  If  n >0,  we  have  [*]  = l(n  — 1)!  (H*.  , -H^,),by  (6.71);  {*}  = 

|(3n  3 '2n  + 3),  by  (6.19). 

6.34  We  have  (^)  = l/(k  + 1),  ( ^)=  Hj^,  and  in  general  (")  is  given 
by  (6.38)  for  all  integers  n. 

6.35  Let  n be  the  least  integer  > 1/e  such  that  |_HnJ  > LHn-lJ. 

6.36  Now  dk+ 1 = (100+(1  + di ) + ■ • ■ + ( 1 + dk ) ) / ( 1 00  + k) , and  the  solution 
is  dk+i  = Hk+ioo  — Hioi  + 1 for  k ^ 1.  This  exceeds  2 when  k ;>  176. 

6.37  The  sum  (by  parts)  is  Hmn  - + 2IL  + . . . + 3.)  = Hmn  - H„  The 

infinite  sum  is  therefore  lnm.  (It  follows  that 


I 

kUl 


"Vm(k) 

k(k+  1) 


m 


m - r 


lnm , 


because  v,(k)  = (m-  1)  (k  mod  m’j/m’.) 

6.38  (-1)k((V)r_1  “ (k  i ) Hk)  + C.  (By  parts,  using  (5.16).) 

6.39  Write  it  as  ^ls.js;n  j-1£Kks.nHk  and  sum  first  on  k via  (6.67),  to  get 

(n+1)H^-(2n  + 1)Hn  + 2n. 
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6.40  If  6n  1 is  prime,  the  numerator  of 


4n  1 ^_Vjk- 

Z_  k 

k = l 

is  divisible  by  6n  ■ 


-h2. 


1,  because  the  sum  is 


4 TL  1 


3n  1 


L-L 


k=--2n 


k=2n 


\ k 6n 


+ 


1 


1 


3n  -1 


- X 


6n  — 1 


k(6n  — 1 — k) 


k=2ri 

Similarly  if  6n  + 1 is  prime,  the  numerator  of  ]Lk=i  ('1  )k  Yk  = TUn  — H2n 
is  a multiple  of  6n  + 1.  For  1987  we  sum  up  to  k = 1324. 

6.41  Sn+i  = £k  (L(n+1+kl/2J)  _ (Llnkk1)/2J),  hence  we  have  Sn+1  + Sn  = 

Lk  (L(™  kk  2+1J)=  Sn+2-  The  answer  is  Fn+2. 

6.42  F„. 


z2  ) to  get  || . The  sum  is  a 


6.43  Set  z = ^ in  Lnj,0  FnZn  = z /0  “ z 
repeating  decimal  with  period  length  44: 

0. 1 1 235  95505  61 797  75280  89887  64044  94382  02247 19101  1 2359  55+ 

6.44  Replace  (m,  k)  by  (— m,  -k)  or  (k,  -mj  or  (—  k,  m),  if  necessary,  so 

that  in  k i>  0.  The  result  is  clear  if  m = k.  If  m > k,  we  can  replace  (m,  k) 

by  (m  — k,  m)  and  use  induction. 

6.45  Xn  — A(n)a+B(n)(3  + C(n)y  + D(n)6,  where  B(n)  = F,  , A(n)=Fni, 

A(n)  + B(n)  — D(n)  = 1,  and  B(n)  — C(n)  + 3D(n)  = n. 

6.46  (j)/2  and  ([) ' 1/2.  Let  u = cos  72”  and  v = cos  36”;  then  u = 2v2  — 1 and 

v = 1-2sin2  18”=  1— 2u2.  Hence  u+v  = 2(u+v)(v— u),  and  4v2— 2v— 1 = 0. 
We  can  pursue  this  investigation  to  find  the  five  complex  fifth  roots  of  unity: 


4>  1 ± iy/2  + 4>  — (j)  ± i\/3  — c(j 


6, 47  2n\/5  Fn  = (1  + v/5)n  (1  >/5)n,  and  the  even  powers  of  \/5  cancel 

out.  Now  let  p be  an  odd  prime.  Then  = 0 except  when  k = (p  1)/2, 

and  (p+1  ) = 0 except  when  k = 0 or  k = (p  1 )/2;  hence  Fp  = 5lp  L/2  and 
2Fp+i  = 1 +5(P“L/2  (mod  p).  It  can  be  shown  that  122  = 1 when  p has 
the  form  10k  ± 1 , and  5+  L/2  = - 1 when  p has  the  form  10k  ± 3. 


"Let  p be  any  old 
prime.” 

(See  [140],  p.  419.) 


6.48  This  must  be  true  because  (6.138)  is  a polynomial  identity  and  we  can 
set  a,  = 0. 
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6.49  Set  z = ' in  (6.146);  the  partial  quotients  are  0,  2F°,  2F| , 2F2,  • • 
(Knuth  [172]  noted  that  this  number  is  transcendental.) 

6.50  (a)  f(n)  is  even  3\n.  (b)  If  the  binary  representation  of  n is 
(1a'0Q2...  1 “m  1 0Qm ) 2 , where  m is  even,  we  have  f(n)  = K(qi,Q2,.  ,a,  j). 

6.51  (a)  Combinatorial  proof:  The  arrangements  of  {1,2,..  , p } into  k sub- 
sets or  cycles  are  divided  into  “orbits”  of  1 or  p arrangements  each,  if  we 
add  1 to  each  element  modulo  p.  For  example, 

{1  >2,4}  U {3,5}  -i  {2,3,5}  U {4, 1}  -4  {3,4, 1}  U {5,2} 

->  {4, 5, 2}  U {1,3}  ->  {5, 1 ,3}  U {2,4}  ->  {1 ,2,4}  U {3, 5} . 

We  get  an  orbit  of  size  1 only  when  this  transformation  takes  an  arrangement 
into  itself;  but  then  k = 1 or  k = p.  Alternatively,  there’s  an  algebraic  proof: 
We  have  Xp  = X-  + X-  and  x-  = xp  — x (mod  p),  since  Fermat’s  theorem  tells 
us  that  xp  — x is  divisible  by  (x  — 0)  (x  1).  ■ ■ (x  — ( p - I ) ) . 

(b)  This  result  follows  from  (a)  and  Wilson’s  theorem;  or  we  can  use 
xP-l  = xp/(x  — 1 ) = (xp -x)/(x-  1)  = xp  1 4-  xp~2  + ■ ■ ■ + x. 

(c)  We  have  {P]}1 } = [p|}1]  = 0 for  3 <)  k <C  p,  then  {p^2}  = [p^2]  = 0 
for  4 <C  k ^ p,  etc.  (Similarly,  we  have  [2pp  ']  = — {2pp  1 } = 1 . ) 

(d)  p ! = PE  = Lk(-1)P_kPk[k]  =PP[p]  -PP_1[p-i]  +-’-+P3[P]  - 
P20  +P[i]-  But  PR]  = P!-  so 


"p" 

V 

2 

V 

"p" 

_2_ 

= P 

3 

-P 

4 

H h pp~2 

,P_ 

is  a multiple  of  p2.  (This  is  called  Wolstenholme’s  theorem.) 

6.52  (a)  Observe  that  Hn  = H*  + HLn/pj /p,  where  H*  = £[[}=1(k  1 p)/k. 

(b)  Working  mod  5 we  have  HT  = (0,  1,4, 1,0)  for  0 <C  r Sj  4.  Thus  the  first 
solution  is  n = 4.  By  part  (a)  we  know  that  5\an  =¥  5\ai  Tl/5j  J so  the  next 
possible  range  is  n = 20  + r,  0 ^ r 4,  when  we  have  Hn  = H*  + 5*4  = 
H|0  + ^H4+Hr-|-XIk=i  20/k(20+k).  The  numerator  of  H20,  like  the  numerator 

of  H4,  is  divisible  by  25.  Hence  the  only  solutions  in  this  range  are  n = 20 

and  n = 24.  The  next  possible  range  is  n,  = 100  + r;  now  Hn  = H*  + 5H20, 

which  is  4h2c  + Ht  plus  a fraction  whose  numerator  is  a multiple  of  5.  If 

|H2o  = m (mod  5),  where  m is  an  integer,  the  harmonic  number  Hioo+r  will 
have  a numerator  divisible  by  5 if  and  only  if  m + Hr  = 0 (mod  5);  hence 
TIT  must  be  = 0,  1 , or  4.  Working  modulo  5 we  find  4 H20  = 5 H20  T J5  H4  = 
J5H4  = Yj  = 3;  hence  there  are  no  solutions  for  100  ^ Tl  ^ 104.  Similarly 
there  are  none  for  120  n ^ 124;  we  have  found  all  three  solutions. 

(By  exercise  6.51(d),  we  always  have  p2\ap -i , p\ap2_p,  andp\ap2_!, 
if  p is  any  prime  }>  5.  The  argument  just  given  shows  that  these  are  the  only 
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solutions  to  p\an  if  and  only  if  there  are  no  solutions  to  p 2Hp_i  +HrsO 
(mod  p)  for  0 <2  r < p.  The  latter  condition  holds  not  only  for  p = 5 but 
also  for  p = 13,  17,  23,  41,  and  67-perhaps  for  infinitely  many  primes.  The 
numerator  of  H,,  is  divisible  by  3 only  when  n = 2,  7,  and  22;  it  is  divisible 
by  7 only  when  n = 6,  42,  48,  295,  299,  337,  341,  2096,  2390,  14675,  16731, 
16735,  and  102728.) 


(Attention,  com- 
puter programmers: 
Here's  an  interest- 
ing condition  to 
test,  for  as  many 
primes  as  you  can.) 


6.53  Summation  by  parts  yields 


n + 1 

(n  + 2)2 


-1  )m 

— j-((n  + 2)Hm+i  - 1)  - 1 

lm+W 


6.54  (a)  If  m ^ p we  have  S,(p)  = Sm-(p-t)(pj  (mod  p),  since  kp  1 = 1 
when  1 ^ k < p.  Also  Sp_,  (p)  = p — 1 = — 1.  If  0 < m < p — 1,  we  can  write 


Sxn(p)  = Y- 

j=0 


m 


p-  1 m 

-i;r-*£k>  = y 

k=0  j=0 


m 

j . 


i + i 


= o 


(b)  The  condition  in  the  hint  implies  that  the  denominator  of  l2n  is  not 
divisible  by  any  prime  p;  hence  l2n  must  be  an  integer.  To  prove  the  hint,  we 
may  assume  that  n > 1.  Then 


^2n  + 


[(P-1  )\(2n)] 


ln-2 

k=0 


2n+1\„  p2n-k 
k2n+1 


is  an  integer,  by  (6.78),  (6.84),  and  part  (a).  So  we  want  to  verify  that  none 
of  the  fractions  (2nk+1)B|<p2''1  k/(2rt  + 1)  = (2k  )Bkp2n_k/(2n  — k + 1)  has  a 
denominator  divisible  by  p.  The  denominator  of  (2^)BkP  isn’t  divisible  by  p, 
since  B^  has  no  p2  in  its  denominator  (by  induction);  and  the  denominator 
of  pin  k-  Y(2n  — k + 1)  isn’t  divisible  by  p,  since  2n  — k + 1 < p2n  k when 
k <C  2n— 2;  QED.  (The  numbers  l2n  are  tabulated  in  [185],  Hermite  calculated 
them  through  I,g  in  1875  [153].  It  turns  out  that  I2  = I4  = Ig  = Is  = 
1,0  = 1,2  = 1;  hence  there  is  actually  a “simple”  pattern  to  the  Bernoulli 
numbers  displayed  in  the  text,  including  T7py(0-  But  the  numbers  l2n  don’t 
seem  to  have  any  memorable  features  when  n > 6.  For  example,  B24  = 
-86579  \ | 5 “ 7 — Tj'  an^  86579  is  prime.) 

(c)  The  numbers  2—1  and  3—1  always  divide  2n.  If  n is  prime,  the  only 
divisors  of  2n  are  1 , 2,  n,  and  2n,  so  the  denominator  of  B^n  for  prime  n > 2 
will  be  6 unless  2n+1  is  also  prime.  In  the  latter  case  we  can  try  4tl+3,  8n  + 7, 
until  we  eventually  hit  a nonprime  (since  n divides  2”  ’n  + 2”  1 — 1) . 
(This  proof  does  not  need  the  more  difficult,  but  true,  theorem  that  there  are 
infinitely  many  primes  of  the  form  6k  + 1.  ) The  denominator  of  Bin  can  be  6 
also  when  n has  nonprime  values,  such  as  49. 


(The  numerators  of 
Bernoulli  numbers 
have  important 
connections  to 
the  known  results 
about  Fermat's 
Last  Theorem;  see 
Ribenboim  [249].) 
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6.55  The  stated  sum  is  (x  jln)  ) , by  Vandermonde’s  convolution. 

To  get  (6.70),  differentiate  and  set  x = 0. 

6.56  First  replace  kn+1  by  ((k  ■*-  mj  + m)  n + 1 and  expand  in  powers  of 

k m;  simplifications  occur  as  in  the  derivation  of  (6.72).  If  m > n or 
m < 0,  the  answer  is  (-1  )nri!  mn/(nrim).  Otherwise  we  need  to  take  the 

limit  of  (5.41)  minus  the  term  for  k = m,  as  x — — m;  the  answer  comes  to 
(-1  )nri!  + (-1  )m+1  (^)mn(n  + 1 + mHn 

6.57  First  prove  by  induction  that  the  nth  row  contains  at  most  three 

distinct  values  A„  (>  Bn  ^Cn;  if  n is  even  they  occur  in  the  cyclic  or- 
der [Cn,  Bn,  An,  Bn,  CJ,  while  if  n is  odd  they  occur  in  the  cyclic  order 
[Cn,Bn,An,An,Bn].  Also 

Ain+l  = Arn  + Bin  i Ain  = 2A2n  1; 

Bin+I  — Bin  A ^2n  ! Bin  — A?n  1 + Bin  1 j 

C’n+1  - 2C2n  I ^ 2n  = ® 2n  1 A C 2 n — 1 • 

It  follows  that  Qn  = A,  — Cn  = Fu+i.  (See  exercise  5.75  for  wraparound 
binomial  coefficients  of  order  3.) 

6.58  (a)£^F2z"  =z(l-z)/(l+z)(l-3z  + z2)=  l((2-3z)/(l-3z  + 

z2)  -2/(1  T z)) . (b)  = —2z  — z2)/(l  — 4z  - z2 ) ( 1 +z-z2)  = 

F(2z/(1  — 4z  — z2) +3z/(1  +z  — z2)).  (These  formulas  are  obtained  by  squaring 
or  cubing  Binet’s  formula  (6.123)  and  summing  on  n,  then  combining  terms 
so  that  (jf  and  $ disappear.)  It  follows  that  F2  + ] ™ 4F2  — F2  , — 3(  — l)nFn. 

(The  corresponding  recurrence  for  mth  powers  has  been  found  by  Jarden  and 
Motakin  [163].) 

6.59  Let  m be  fixed.  We  can  prove  by  induction  on  n that  it  is,  in  fact, 
possible  to  find  such  an  x with  the  additional  condition  x ^ 2 (mod  4).  If  x 
is  such  a solution,  we  can  move  up  to  a solution  modulo  3n+I  because 

Fg.3n-i  = 3".  Fg.3,,-1  , = 3n  + 1 (mod  3n+1 ) ; 

either  x or  x + 8-3n  1 or  x + 16-3n  ' will  do  the  job 

6.60  F]  +1,  F2  + 1,  F3  -|- 1 1 F4  1,  and  Fg  —1  are  the  only  cases.  Otherwise 
the  Lucas  numbers  of  exercise  28  arise  in  the  factorizations 

F2m  + (-l)m  = Lm  + 1 Fm  Fim+l  + ( — 1 )m  = LmFm  + li 

F2 m ( 1 ) m = Lm  1 F m+1  ; F2m+1  — ( — 1 )m  = Lm+i  Fm  . 

(We  have  Fm+n  - (-l)nFm_n  = LmFn  in  general.) 

6.61  1 /F2m  = Fm  , /Fm  — F2m  1 /F2m  when  m is  even  and  positive.  The 

second  sum  is  5/4  F3.2n_i/F3.2n,  forn(>l. 
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6.62  (a)  A„  = \/5  An_i  — An_2  and  Bn  = \/5Bn  , Bn _2.  Incidentally, 
we  also  have  s/5  An  -f  Bn  = 2An+i  and  \/5Bn  A„  — 2Bn  i . (b)  A table  of 
small  values  reveals  that 

A,  = I1"  " 'Ten;  Bn  = ( ^ n eve»; 

\\/5Fn,  n odd;  \ Ln,  n odd. 

(c)  Bn/An+i  Bn_)/An  = 1/(F2n+i  + 1)  because  BnAn  - Bn  i An+1  = s/5 
and  AnAn+i  = x/5(F2n+i  + !)•  Notice  that  Bn/An+i=  (Fn/Fn+] ) [n  even]  + 
(Ln/Ln+i )[n  odd],  (d)  Similarly,  ££=1 1 /(F2k+i  - 1 ) = (Ao/B,  - Ai/B2)  + 
• • + (An_ -i/Bn  — An/Bn+i  ) = 2 — An/Bn+i  This  quantity  can  also  be 
expressed  as  (5Fn/Ln+, ) [rt  even]  + (Ln/Fn+1)  [n  odd], 

6.63  (a)  [£].  There  are  with  nn  = n and  (n  — l)[nk1]  with  nn  < n. 

(b)  Each  permutation  pi  . . pn  of  {1 , . . , n-  1}  leads  to  n permutations 
7T] 7t2  . . .7tn=  Pi  pj  1 n Pj+1  • • • pu_i  Pj.  If  Pi  . Pn  1 has  k excedances, 
there  are  k+  1 values  of  j that  yield  k excedances  in  7T]  7t2  - - - 7Tn ; the  remaining 
n-  1 — k values  yield  k+  1.  Hence  the  total  number  of  ways  to  get  k excedances 
in  7i, 7i2 . . . 7Tn  is  (k  + 1 )(nk1)  + ((n  — 1 ) — (k-  1 ])(£:])  = (k). 

6.64  The  denominator  of  is  24n_V2*n',  by  the  proof  in  exercise  5.72. 

The  denominator  of  [ is  the  same,  by  (6.44),  because  ((/))  = 1 and 

(//))  is  even  for  k > 0. 

6.65  This  is  equivalent  to  saying  that  (k)/rt!  is  the  probability  that  we 
have  [xi  -j-  4-  xnJ  = k,  when  x, , , xn  are  independent  random  numbers 
uniformly  distributed  between  0 and  1.  Let  yj  = (x,  + . . . -f  Xj  ) mod  1.  Then 
y 1 , . . . | yn  are  independently  and  uniformly  distributed,  and  |_*i  + • ■ ■ + XnJ 
is  the  number  of  descents  in  the  y’s.  The  permutation  of  the  y’s  is  random, 
and  the  probability  of  k descents  is  the  same  as  the  probability  of  k ascents. 

6.66  We  have  the  general  formula 


n + m + 1 - kl 
m+  1 -k  J 


(-1  )k  , 


analogous  to  (6.38).  When  m = 2 this  equals 


for  n > m 0, 


((I))  = 


n + 3 
3 


(2n+V 


rt  + 2 

2 


2u  + 1 
2 


n + 1 
1 


= |3n+2  - (2n  + 3)2n+1  + \ (4n2  + 6n  + 3) . 


6.67  {n(n+ ^)(n+ 1 )(2H2n  — Hn)  — jgn(10n2 +9n— 1).  (It  wouldbenice 


to  automate  the  derivation  of  formulas  such  as  this.) 
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6.68  1 /k  — 1 / ( k + z)  = z/k2  — z2/k2  + , and  everything  converges  when 

\z  < 1. 

6.69  Note  that  f[£=1  t 1 + z/k)e-z/k  = (n+z)rrze(lnn  If  f(z)  = JL(zi) 
we  find  f(z)/z!  + y □ = Hz. 

6.70  For  tan  z,  we  can  use  tan  z = cot  Z 2 cot  22  (which  is  equivalent  to  the 
identity  of  exercise  23).  Also  z/sin  z = zcot  z + ztan  jz  has  the  power  series 
Ln^oM  )n_’  (4n  - 2)B2nz27(2n)!;  and 


, tanz  sinz 
In = In  +ncosz 

(2n)(2n)! 


Ih: 


n>  1 


v4n(4n-1  )B2nz2n 
(2n)(2n)! 


Ih: 


n'->  1 


l(4n  — 2)B2nz2 
(2n)(2n)! 


because  4-  In  sin  z = cot  z and  4-  In  cos  z = -tan  z. 
az  dz 

6.71  Since  tan2z*sec22  = (sinz+  cosz)/(cosz  sinz),  setting  x = 1 in 

(6.94)  gives  Tn  (1)  = 2nTn  when  n is  odd.  Tn  (1)  = 2nEn  when  n is  even,  where 

1 /cos  z=  Zn>c  B2nZ2n/ ( 2tl ) ! (The  En  are  called  Euler  numbers,  not  to  be 

confused  with  the  Eulerian  numbers  ('^) .) 

6.72  2n+1(2n+1  — l)Bn+i/(n  + 1),  if  n >0.  (See  (7.56)  and  (6.92);  the 

desired  numbers  are  essentially  the  coefficients  of  1 tanhz.) 

6.73  cot ( z + 7t)  = cot  Z and  cot(z  + -1  zr)  = -tan  z;  hence  the  identity  is 
equivalent  to 


2 11  — 1 


cot  z = ■ — y cot 

2n  i— 


Z + k7t 


k=C 


which  follows  by  induction  from  the  case  n = 1 . The  stated  limit  follows  since 
zcot  Z — > 1 as  z — > 0.  It  can  be  shown  that  term-by-term  passage  to  the  limit 
is  justified,  hence  (6.88)  is  valid.  (Incidentally,  the  general  formula 


COtz 


n — 1 

y cot 

k=0 


Z + k7T 

n 


is  also  true.  It  can  be  proved  from  (6.88),  or  from 


enz  1 


n-1 

L 

k=C 


1 

gz+2k7it/n  1 ’ 


which  is  equivalent  to  the  partial  fraction  expansion  of  l/(zn  — 1).) 
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6.74  If  p(x)  is  any  polynomial  of  degree  <;  n,  we  have 


because  this  equation  holds  for  x = 0,  -1,  . . . , -n.  The  stated  identity  is 
the  special  case  where  p(x)  = Xffn  (x)  and  x = 1.  Incidentally,  we  obtain 
a simpler  expression  for  Bernoulli  numbers  in  terms  of  Stirling  numbers  by 
setting  k = 1 in  (6.99): 


6.75  Sam  Loyd  [204,  pages  288  and  378]  gave  the  construction 


and  claimed  to  have  invented  (but  not  published)  the  64  = 65  arrangement 
in  1858.  (Similar  paradoxes  go  back  at  least  to  the  eighteenth  century,  but 
Loyd  found  better  ways  to  present  them.) 

6.76  We  expect  Am/Am_i  ~ cj),  so  we  try  Am_i  = 618034  + r and  Am_2  = 
381966-r.  Then  A,  3 = 236068  + 2r,  etc.,  and  we  find  Am_ig=  144  — 2584r, 
Am_]9  = 154  4-  41 81  r.  Hence  r = 0,  x = 154,  y = 144,  m = 20. 

6.77  If  P(Fn+i,  F,)  = 0 for  infinitely  many  even  values  of  n,  then  P(x,y)  is 
divisible  by  U(x,y)  — 1,  where  U(x,y)  = x2  — xy  — y2.  For  if  t is  the  total 
degree  of  P,  we  can  write 


P(x,y)  = 2LqkxV  k + Y_  rj,kxiyk  - Q(x,y)  +R(x,y) 

k=0  j+E<t 


+ 0(1/Fn) 


Then 
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Exercise:  m o n = 
mn+ 

L(m+1)/4>Jn+ 
m 1_(tl~h1  )/ 4>J  _ 


and  we  have  £lk=o  dk4}k  = 0 by  taking  the  limit  as  ri  — > oo.  Hence  Q(x,y)  is 
a multiple  of  U(x,y),  say  A(x,y)U(x,y).  But  U(Fn+i,Fn)  = (-1)”  and  n is 
even,  so  P0(x,y)  = P(x,y)  - (U(x,y)  - l)A(x,y)  is  another  polynomial  such 
that  Po(Fn+i , F,)  = 0.  The  total  degree  of  Po  is  less  than  t,  so  P0  is  a multiple 
of  U — 1 by  induction  on  t. 

Similarly,  P(x,y)  is  divisible  by  U(x,y)+ 1 if  P(Fn+i  , Fn ) = 0 for  in- 
finitely many  odd  values  of  n.  A combination  of  these  two  facts  gives  the 
desired  necessary  and  sufficient  condition:  P(x,  y)  is  divisible  by  ll(x,  y)2  — 1. 

6.78  First  add  the  digits  without  carrying,  getting  digits  0,  1 , and  2.  Then 
use  the  two  carry  rules 

0(d+l)  (e+1)  -»  Ide, 

0 (d+2) Oe  ->  1 dO(e  + 1) , 

always  applying  the  leftmost  applicable  carry.  This  process  terminates  be- 
cause the  binary  value  obtained  by  reading  (b,  . . . 1}2)f  as  (b,  . . . 132)2  in- 
creases  whenever  a carry  is  performed.  But  a carry  might  propagate  to  the 
right  of  the  “Fibonacci  point’';  for  example,  (1  )f  + (1  )f  becomes  ( 10.  01)  ~ . Such 
rightward  propagation  extends  at  most  two  positions;  and  those  two  digit  po- 
sitions can  be  zeroed  again  by  using  the  text’s  “add  1"  algorithm  if  necessary. 

Incidentally,  there’s  a corresponding  “multiplication”  operation  on 
nonnegative  integers:  If  m = Fj,  + • . . + Fj  and  n = Fk,  + • ■ . + Fkr  in  the  Fibo- 
nacci number  system,  let  m 0 n ~ Hb=1  2Lc=1  ^ib+kc’  by  analogy  with  mul- 
tiplication of  binary  numbers.  (This  definition  implies  that  m o n k y/5  mn 
when  m and  n are  large,  although  1 0 n ~ (|)2n.)  Fibonacci  addition  leads  to 
a proof  of  the  associative  law  l o (m  o n)  = (l  o m)  o n.) 

6.79  Yes;  for  example,  we  can  take 

A0=  331635635998274737472200656430763; 

A,  = 1510028911088401971189590305498785  . 


The  resulting  sequence  has  the  property  that  A„  is  divisible  by  (but  unequal 
to)  pk  when  rt  mod  m.k  = rki  where  the  numbers  (p^,  mk , r^ ) have  the  follow- 
ing 18  respective  values: 


(3,4,1): 

(7,8,3) 

(47,16,7) 

(2207,32,15) 

(1087,64,31) 

(4481,64,63) 


(2,3,2) 

(17,9,4) 

(19,18,10) 

(53,27,16) 

(109,27,7) 

(5779,54,52) 


(5,5,1) 

(11,10,2) 

(61,15,3) 

(31,30,24) 

(41,20,10) 

(2521,60,60) 


538  ANSWERS  TO  EXERCISES 


One  of  these  triples  applies  to  every  integer  n;  for  example,  the  six  triples  in 
the  first  column  cover  every  odd  value  of  n,  and  the  middle  column  covers  all 
even  n that  are  not  divisible  by  6.  The  remainder  of  the  proof  is  based  on 
the  fact  that  A,„+„  — AmFn  i + Am+-|Fn,  together  with  the  congruences 


A0  = Fmk_rt  mod  pk  , 
Ai  = Fmk  ._rk  + 1 mod  pk  , 


for  each  of  the  triples  (pk,  mk,  rk).  (An  improved  solution,  in  which  A q and  A] 
are  numbers  of  “only”  17  digits  each,  is  also  possible  [184].) 

6.80  The  matrix  product  is 

( Kn  2(x2,...,xn  i)  Kn_i  (x2, . . . , xn  i , xn ) X 
\ Kn  i(xi,X2,...,xn_,.)  Kn(x,,x2)...,xn_i,xny  . 


This  relates  to  products  of  L and  R as  in 


R° 


(6.137),  because  we  have 


The  determinant  is  Kn  (xj , , x,);  the  more  general  tridiagonal  determinant 


xi  1 
V2  X2 


0 

1 

X3 


0 

0 


00  yn  xn 


satisfies  the  recurrence  Dn  = XnDn  1 ™ ynDn-2. 

6.81  Let  a""1  — Qq  + 1 /(Qi  + l/(  ct2  + ))  be  the  continued  fraction  repre- 
sentation of  CX  1 . Then  we  have 


A0(z)  + 


A,  (z)  + - 

A2(z)  H 


I - z 

z 


Y_  z[nKj , 

n£1 


where 

• ] ua  1 

Afit(z)  = ^ I 4m  — Km(cti  , . . . , am) . 

A proof  analogous  to  the  text’s  proof  of  (6.146)  uses  a generalization  of  Zeck- 
endorf’s  theorem  (F’raenkel  [104,  §4]).  If  z = 1/b,  where  b is  an  integer  )>  2, 
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this  gives  the  continued  fraction  representation  of  the  transcendental  number 
(b  — 1 ) ^n>]  b~  Ln“J  | as  in  exercise  49. 

6.82  The  sequences  of  exercise  62  satisfy  A_m  = A,,„  B_.m  = — Bm,  and 

Am.An  — Ara+n  + Am_n  , 

AmBn  — bm..  »•„  Bm_n  , 

BmBn  — Am+n  Am_n  . 

Let  fk  = Bmk/Amk+l  and  gk  = Amk/Bmk+l,  where  l = 1 (n  - m).  Then 
fkt-i  fk  = AiBm/(A2mk+n  + Am) and  gk  gk+i  = AiBm/(A2mk+n  Am); 
hence  we  have 


S 

~’m,n 


6.83  Let  p = K(0,Qi , Q2, . . .,  a,),  so  that  p/n  is  the  mth  convergent  to  the 
continued  fraction.  Then  a = p/n  + (-1  )m/nq,  where  q = K(ai  , . , a,„,  (3) 
and  (3  > 1.  The  points  {kot}  for  0 k < n can  therefore  be  written 

0 1_  (~1)m7Ti  n - 1 t (-1)m7Tn_i 

n’  n uq  ' ' ' ' ' n nq  1 

where  ti1  . . 7in_]  is  a permutation  of  { 1 , . . , n 1}.  Let  f(v)  be  the  number 
of  such  points  < v;  then  f(v)  and  vn  both  increase  by  1 when  v increases  from 
k/n  to  (k  + 1 )/n,  except  when  k = 0ork  = n — l,so  they  never  differ  by  2 
or  more. 

6.84  By  (6.139)  and  (6.136),  we  want  to  maximize  K(qi,.  . . , a,„)  over  all 
sequences  of  positive  integers  whose  sum  is  $ n + 1.  The  maximum  occurs 
when  all  the  a’s  are  1,  for  if  j />  I and  a )>  I we  have 

Kj+k+i  (1 , . . . , 1 , a + 1 , bi , . . . , bk) 

= Kj+k+i  (1 , . . . , 1 , a,bi , . . . , bk)  + Kj(1 , . . . , 1 ) Kk(bi, . . . , bk) 

^ Kj+k+l  (1 l,Q,bl,...,bk)  + Kj+k(l,...,l,Q,b1,...,bki 

= Kj+U+2(1,..  • , 1,a,bi,.. . , bk) . 

(Motzkin  and  Straus  [220]  solve  more  general  maximization  problems  on  con- 
tinuants.) 


V5 

— lim  (fk-fo) 

f\{  Dm  k— »oc 

lim  (g0  — gk) 

A[Dm  k— »oo 


v5 

ct^’AfU  ’ 

V5  / 2 1 

AfLmVBf  c|tl 


: s+  , 

BtLiLm  m'n 
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6.85  The  property  holds  if  and  only  if  N has  one  of  the  seven  forms  5k, 

2 ■ 5k,  4 • 5k , 3’  -5k,  6 ■ 5k,  7-5k,  14-5k. 

6.86  A candidate  for  the  case  n mod  I = / appears  in  [179,  section  6], 
although  it  may  be  best  to  multiply  the  integers  discussed  there  by  some 
constant  involving  y/n. 

6.87  (a)  If  there  are  only  finitely  many  solutions,  it  is  natural  to  conjec- 

ture that  the  same  holds  for  all  primes,  (b)  The  behavior  of  bn  is  quite 
strange:  We  have  bn  = lcm(  1 , . . . , n)  for  968  <C  n Sj  1066;  on  the  other  hand, 
bfiOO  = . . , 600)/(33  -52  -43).  Andrew  Odlyzko  observes  that  p divides 

lcm(  1 , . . . , n) /bn  if  and  only  if  kpm  ^ n < (k  -F  1 )pm  for  some  m ^ 1 and 
some  k < p such  that  p divides  the  numerator  of  H^.  Therefore  infinitely 
many  such  n exist  if  it  can  be  shown,  for  example,  that  almost  all  primes 

have  only  one  such  value  of  k (namely  k = p — 1). 

6.88  (Brent  [33]  found  the  surprisingly  large  partial  quotient  1568705  in  ey , 
but  this  seems  to  be  just  a coincidence.  For  example,  Gosper  has  found  even 
larger  partial  quotients  in  7t:  The  453,294th  is  12996958  and  the  11,504,931st 
is  878783625.) 

6.89  Consider  the  generating  function  |mmn|wmzTl’  which  has  the 

form  (wF(q,  b,  c)  + zF(a',  b',  c'))n,  where  F(  a,  b,  c)  is  the  differential  op- 
erator a + b$w  + cbz. 

7.1  Substitute  z4  for  0 and  z for  □ in  the  generating  function,  getting 

1 / ( 1 — z4  — z2  ) . This  is  like  the  generating  function  for  T,  but  with  z replaced 

by  z2.  Therefore  the  answer  is  zero  if  m is  odd,  otherwise  Fm/2+i  ■ 

7.2  G(z)  = 1/(1  - 22)  + 1/(1  - 32);  G(z)  = e2z  + e3z, 

7.3  Set  z = 1/10  in  the  generating  function,  getting  y In 

7.4  Divide  P(z)  by  Q(z),  getting  a quotient  T(z)  and  a remainder  Po(z) 
whose  degree  is  less  than  the  degree  of  Q.  The  coefficients  of  T(z)  must  be 
added  to  the  coefficients  [ zn 1 Po(z)/Q(z)  for  small  n.  (This  is  the  polynomial 
T(z)  in  (7.28).) 

7.5  This  is  the  convolution  of  ( 1 + z2)r  with  ( 1 + z)r,  so 
S(z)  = (1  + z + z'-  + z3)r . 

Incidentally,  no  simple  form  is  known  for  the  coefficients  of  this  generating 
function;  hence  the  stated  sum  probably  has  no  simple  closed  form.  (We  can 
use  generating  functions  to  obtain  negative  results  as  well  as  positive  ones.) 


Another  reason  to 
remember  1066? 
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7.6  Let  the  solution  to  g0  = oc,  g i = (3,  gn  = gn  l + 2gn  2 + ( — 1 }n y be 
An  — A(n)a+  B(n)|3  -F  C(n)y.  The  function  2n  works  when  ex  = 1,  (3  = 2, 
y = 0;  the  function  ( — 1 )n  works  when  oc  — 1 , (3  = -1,  y — 0;  the  function 

( — 1 ) un  works  when  a = 0,  (3  = -1,  y = 3.  Hence  A(n)  2B(n)  = 2n, 
A(n)  -B(n)  = (-1)u,and  -B(n)  + 3 C ( Tl ) = ( — 1 )nrt. 

7.7  G(z)  = (z/(1  — z)2)G(z)  + 1,  hence 


I bet  that  the  con- 
troversial “fan  of 
order  zero”  does 
have  one  spanning 
tree. 


1 2z  z2 - • 

G(z)  = T — 3z+z2"  1 = J2f-  + z 2 

we  have  gn  - F2n  + ! n = 01 . 

7.8  Differentiate  (f  z)  twice  with  respect  to  x,  obtaining 

(*  + n)((H,„  H,)-  (H“>n  Hi’1)) 

Now  set  x = m. 

7.9  (n+  1)(H2 -HL2))-2n(Hn  f). 

7. f 0 The  identity  — H _-\/i  = + • • • + f = 2H2k  — Hk  implies  that 

Lk(ik)(2:f)(2H2k-Hk)^4-Hn.  ~ 

7.11  (a)  C(z)  = A(z)B(z2)/(l  - z).  (b)  zB'(z)  = A(2z)ez,  hence  A(z)  = 
fe  z/2B'(f).(c)  A(z)  = B(z)/P -z)T+1,  hence  B(z)=  G -z)r+1A(z)  and  we 
have  fk(r)  - Ct1)!-1)1"- 

7.12  C,.  The  numbers  in  the  upper  row  correspond  to  the  positions  of  +1’s 
in  a sequence  of  -|-1  ’s  and  -1  ’s  that  defines  a “mountain  range”;  the  numbers 
in  the  lower  row  correspond  to  the  positions  of  -l’s.  For  example,  the  given 
array  corresponds  to 


7.13  Extend  the  sequence  periodically  (let  xm+k  = xk)  and  define  sn  = 
X;  + • • • + xn.  We  have  sm  = l,  S2m  = 21,  etc.  There  must  be  a largest  index 
kj  such  that  Ski=j,  Ski+m=  l + j,  etc.  These  indices  k],  . . . , k;  (modulo  m) 
specify  the  cyclic  shifts  in  question. 

For  example,  in  the  sequence  (—2, 1,-1  ,0,  1,1—11,  1,1)  with  m = 10 
and  l = 2 we  have  ki  = 1 7,  k2  = 24. 

7.14  G (z)  = — 2zG(z)+  G(z)2  + Z (be  careful  about  the  final  term!)  leads 
via  the  quadratic  formula  to 

1 + 2z  — \/ 1 + 4z2 

G(z)  = 


2 
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Hence  g?n+i  = 0 and  g?n  = (—  1 )n(2n)!  Cn  -i , for  all  n > 0. 

7.15  There  are  (£)bn_k  partitions  with  k other  objects  in  the  subset  con- 

taining n + 1.  Hence  B 1 (z)  = ezB  (z).  The  solution  to  this  differential  equation 
is  B(z)  = eeI+c,  and  c = - 1 since  B(0)  = 1.  (We  can  also  get  this  result  by 
summing  (7.49)  on  m,  since  bn  = {*}■) 

7.16  One  way  is  to  take  the  logarithm  of 

B(z)=  1/((1  — z)Q|  (1  -z2)qM1  — z3)“3  (1  — Z4 ) Qj  ■ ..), 

then  use  the  formula  for  In  and  interchange  the  order  of  summation. 

7.17  This  follows  since  J^°  tne_t  dt  = n!.  There’s  also  a formula  that  goes 
in  the  other  direction: 

G (z)  = G(ze_l0)eeIlid0. 

271  J-7T 

7.18  (a)  £,(z  — 7);  (b)  -L’(z);  (c)  C(z-)/C(2z).  Every  positive  integer  is 
uniquely  representable  as  rrrq,  where  q is  squarefree. 

7.19  If  n > 0,  the  coefficient  [ zn ] exp(xln  F ( z ) ) is  a polynomial  of  degree  n 
in  x that’s  a multiple  of  x.  The  first  convolution  formula  comes  from  equating 
coefficients  of  zn  in  F(z)xF(z)y  = F(z)x+y.  The  second  comes  from  equating 
coefficients  of  zn_1  in  F,(z)F(z)x_1F(z)y  = F/(z)F(z)x+y_  1 , because  we  have 

F,(z)F(z)x-1  = x-  '^(F(z)x)  = x-1  nfn(x)zn-\ 

n^O 

(Further  convolutions  follow  by  taking  3/dx,  as  in  (7.43).) 

7.20  Let  G(z)  = gnzn.  Then 

zlGik)(z)  = £ n-gnzn~k+l  = ]^(n  + k l)V+k-izn 

n.^0  n^O 

for  all  k,  l ;>  0,  if  we  regard  gn  = 0 for  n < 0.  Hence  if  Po(z),  . . , P,  ( z ) are 

polynomials,  not  all  zero,  having  maximum  degree  d,  then  there  are  polyno- 
mials Po(n,), . . . , pm+d(Ti)  such  that 

m-f  d 

P0(z)G(z)  + ■■•  + Pm(z)G,m,(z)  = X X Pj(n)gn+j-dZn. 

n^0  j=0 

Therefore  a differentiably  finite  G(z)  implies  that 
m+d 

X Pj(n  + d)gn+j  = 0, 
i=o 


for  all  n ^ 0. 
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The  converse  is  similar.  (One  consequence  is  that  G(z)  is  differentiably  finite 
if  and  only  if  the  corresponding  egf,  G(z),  is  differentiably  finite.) 


This  slow  method  of 
finding  the  answer 
is  just  the  cashier's 
way  of  stalling  until 
the  police  come. 

The  USA  has 
two-cent  pieces,  but 
they  haven’t  been 
minted  since  1873. 


U <8©0’k/’ 

(The  polygons  might  need  to  be  warped  a bit  and/or  banged  into  shape.) 
Every  triangulation  arises  in  this  way,  because  the  base  line  is  part  of  a unique 
triangle  and  there  are  triangulated  polygons  A and  B at  its  left  and  right. 

Replacing  each  triangle  by  z gives  a power  series  in  which  the  coefficient 
of  zn  is  the  number  of  triangulations  with  n triangles,  namely  the  number  of 
ways  to  decompose  an  (n  + 2)-gon  into  triangles.  Since  P = 1 +zP2,  this  is  the 
generating  function  for  Catalan  numbers  Co  + Ciz  + C2Z2  + . . . ; the  number 
of  ways  to  triangulate  an  n-gon  is  Cn_2  = (^-2  )^n  1)  ■ 

7.23  Let  a,,  be  the  stated  number,  and  bn  the  number  of  ways  with  a 2x  1 X 1 
notch  missing  at  the  top.  By  considering  the  possible  patterns  visible  on  the 
top  surface,  we  have 

an  = 2GU-!  + 4bn_i  + an_2  + fn  = 01; 
bn  = Q-n-1  "*■  bn-1  • 

Hence  the  generating  functions  satisfy  A = 2zA  + 4zB  + z2  A + 1 , B = zA  + zB, 
and  we  have 


7.21  This  is  the  problem  of  giving  change  with  denominations  10  and  20,  so 
G(z)  = 1/(1  — z10 ) ( 1 -z20)  = G(z10),  where  G(z)  = 1/(1  — z)(1  — z2).  (a)  The 
partial  traction  decomposition  of  G(z)  is  j ( 1 — z)“2  + 1 ( 1 — z)~'  + | (1  + z)_1  , 
so  [zn]  G(z)  = l(2n  + 3 + ( - 1 ) " ) . Setting  n = 50  yields  26  ways  to  make 
thepayment.  (b)  G(z)  = (1  + z)/(1  — z2)2  — (1  + z) ( 1 + 2z2  + 3z4  + •••),  SO 
[zn]  G(z)=  |_n/2J  + 1.  (Compare  this  with  the  value  Nn=  |_rr/5 J + 1 in  the 
text’s  coin-changing  problem.  The  bank  robber’s  problem  is  equivalent  to  the 
problem  of  making  change  with  pennies  and  tuppences.) 

7.22  Each  polygon  has  a “base”  (the  line  segment  at  the  bottom).  If  A 
and  B are  triangulated  polygons,  let  AAB  be  the  result  of  pasting  the  base 
of  A to  the  upper  left  diagonal  of  A,  and  pasting  the  base  of  B to  the  upper 
right  diagonal.  Thus,  for  example, 


A(z)  _ 


I ■ z 

(1  + z) ( 1 - 4z  + z2)  ' 


This  formula  relates  to  the  problem  of  3 x n domino  tilings;  we  have  a„  = 

j (U2n  + V2n+1  + (-1  )n)  =K 2 + n/3  r+1  + 1 (2  - y/3  )n+1  + 3 (-1  )n,  which  is 
(2  + \/3  )n+1/6  rounded  to  the  nearest  integer. 
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7.24  n^k)+  ,.+k  k]  . . . km/m  = F2n+i  + hn-i  - 2.  (Consider  the 
coefficient  [zn~ "']  ^ In (1/(1  -G(z))),  where  G(z)  = z/(1  — z)2.) 

7.25  The  generating  function  is  P(z)/(1  zm),  where  P(z)  = z + 2z2  + 

. . + (m  — 1 )zm-'  = ((m  1 )zm+1  mzm  + z)/(  1 z)2.  The  denominator 

is  Q (zj  = 1 zm  = ( 1 — tu°z)(1  — tu'z) . . . (1  — cum_  ,z).  By  the  rational 
expansion  theorem  for  distinct  roots,  we  obtain 


nmodm 


m-  1 


L 

k=l 


O) 


kn 


CUk  - 1 


7.26  (1  -Z  Z2)5(z)  = F(z)  leads  to  Jn  = (2(n  + 1 )Fn  + nFn+i)/5  as  in 
equation  (7.60). 

7.27  Each  oriented  cycle  pattern  begins  with  $ or  “ or  a 2 x k cycle  (for 
some  k 2)  oriented  in  one  of  two  ways.  Hence 


Qll  = Qn-1  + Qn-zjf-  2Qn-  2+2Qn_3  + ■ ■ • + 2Qo 
for  n 3 2;  Qo  = Qi  = 1.  The  generating  function  is  therefore 


Q(z)=  zQ(z)  +z2Q(z)  +2z2Q(z)/(1  - z)  + 1 
= 1/(1 -z-z2 -2z2/(l -z)) 

(1  - z) 

= ( 1 — 2z  — 2z2  + z3 ) 

= 4>2/5  4)-2/5  + 2/5 

l — 4>2z  l — 4?  2z  1 + z ' 

and  Qn  = (4)2n+2  + c|r2ft-2  + 2(— 1 )n)/5  = ((cj)n+'  4>n+1  )/y/5  )2  = F2+) . 

7.28  In  general  if  A(z)  = (1  + z + ■ > • -)-  Zm~’  )B(z),  we  have  A,  + A,„  + 

^r+2Tn  + ' ' ' = B(l)  for  0 < r < m . I n this  case  m = 10  and  B(z)  — 

(1  + z 4 F z9)(l  + z2  + z4  + z6  + zg ) ( 1 + z5). 

7.29  F ( z ) + F(z)2  +F(z)3+  ■ ■ ■ = z/f1  — z — z2  — z)  = (1/(1  — (1  +V/2)z)  — 

(1/(1  -(1-  \Zl)z))/V&,  so  the  answer  is  (d  + y/2  )n  - (1  - V2'-)/V8. 

7-30  Zk=i  (2nn-1rk)  (anbn-k/(l  -az)k  + Qn-kb7(l  — (3z)k),  by  exercise  5.39. 

7.31  The  dgf  is  C(z)2/C(z— 1 );  hence  we  find  g(n)  is  the  product  of  (k+l-kp) 
over  all  prime  powers  pk  that  exactly  divide  n. 

7.32  We  may  assume  that  each  ^ 0.  A set  of  arithmetic  progressions 
forms  an  exact  cover  if  and  only  if 

| zbi  zbm 

1 ■ Z 1 — zat  1 f 1 z°m  ' 
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Subtract  zbm/(  1 z“m  ) from  both  sides  and  set  z = e2711^”.  The  left  side  is 
infinite,  and  the  right  side  will  be  finite  unless  am_i  = a,. 

7.3  3 (-1  )n“m+1  [rt > m]/(n  m). 

7.34  We  can  also  write  G,(z)  = Lk)+(m+1  |km.,=n  + ' • In 

general,  if 


G 


n 


L 

k|  +2k.2+**‘+rkr  =n 


/ki  + k2  + • • • + k,  k2 

V ki,k2,...,kr  ) 1 2 


we  have  Gn  = z-\  Gn-.i  + Z2Gn  -2  + • ■ • + ZrGn-r  + [tx  = 0] , and  the  generating 
function  is  1/(1  — z.i  w — z2W2  — . . . zTwr).  In  the  stated  special  case  the 
answer  is  1/(1  — w — zmwm+1)-  (See  (5.74)  for  the  case  m = 1.) 

7.35  (a)  ^ ^2.o<k<n (ly|,k  + l/(ri_k70  = ^Hn_i.  (b)  [zn]  (in  f— ) = ^r[2]  = 
2-Hn_i  by  (7.50)  and  (6.58).  Another  way  to  do  part  (b)  is  to  use  the  rule 
[zn]  F(z)  = ±[zn-']  F(z)  with  F(z)  = (In  . 

7.36  Ir^A(zm). 

7.37  (a)  The  amazing  identity  Q2n  = a2n+i  = bn  holds  in  the  table 


n 

0 1 2 

3 

4 

5 

6 

7 

8 

910 

an 

2 

2 

4 

4 

6 

6 

10 

10 

14 

bn 

1 2 4 

6 

10 

14 

20 

26 

36 

46 

60 

h\^/J 


ib)Aiz)=  r/(:l  - z)  ( I -z2)(l  - z4)(l  -z8)...  ).  (c)  B(z)  = A(z)/(1  - z), 
and  we  want  to  show  that  A(z]  = (1  + z)B(z2).  This  follows  from  A(z)  = 
A(z2)/(1  — z ) ■ 

7.38  (1  - wz)M(w,z)  = n>i  (min(m,n)  - min(m— 1 ,n— 1 ))wmzn  = 
2Zm  n >1  vv’mz'n  = wz/(  1 — w)  ( 1 — z).  In  general, 


M(z, Zm) 


Z]  . . . Zm. 

( 1 Z]  ) . . . ( 1 Z-m ) ( 1 2^1  1 * • 2>ttl  ) ■ 


7.39  The  answers  to  the  hint  are 


Y_  ak2  ■ Qkm 

1 ^ki  <k2<"'<km<in 


and 


Y_  Qki  Qk>  ■ Qkm 

1<lki^k2^---^km^n 


respectively.  Therefore:  (a)  We  want  the  coefficient  of  zm  in  the  product 
(1  + z) ( 1 + 2z) . . . (1  + nz).  This  is  the  reflection  of  (z  + 1 )n,  so  it  is  [*+]]  + 
[n;’]z  +<■•+’  [n|']zn  and  the  answer  is  ]m].  (b)  The  coefficient  of  zm 
in  1 /(( 1 — z) ( 1 — 2z) . . . (1  — nz))  is  {m,{n}  by(7.47). 
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7.40  The  egf  for  (nFn  , - F,  ) is  (z  1 )F(z)  where  F(z)  = £n?0  Fnzn/n!  = 
(e4>z  _ e^z ) / \/5 • The  egf  for  (nj)  is  e ~2/(1  — z).  The  product  is 


5 i/'2|yjj>  l)2_e(<t>  1 12^  _ 5 '/2je -<t>z  g $2) 

We  have  Ffz)e“ 2 — — F ( — z ) . So  the  answer  is  ( — 1 )nFn. 

7.4 f The  number  of  up-down  permutations  with  the  largest  element  n in 
position  2k  is  (,nk  ‘,)A2k  iAn  Similarly,  the  number  of  up-down  permu- 
tations with  the  smallest  element  1 in  position  2k  + 1 is  (n2k')  AzkAn  2k  1 > 
because  down-up  permutations  and  up-down  permutations  are  equally  nu- 
merous. Summing  over  all  possibilities  gives 

2An  = Y_  (n“1)  AkAn_,_k  + 2[n  = 0]+'  In-1] 


The  egf  A therefore  satisfies  2A,(z)  = A(z)2  + 1 and  A(0)  — 1;  the  given 
function  solves  this  differential  equation. 

7.42  Let  a,  be  the  number  of  Martian  DNA  strings  that  don’t  end  with  c 
or  e;  let  bn  be  the  number  that  do.  Then 


Qn  - 3an_i  +2bn_i+  [n  = 0] , 
A(z)  = 3zA(z)  + 2zB(z)  + 1 , 

A(z)  = 


1 


1 — 4z  - 


bn  i + bn_]  , 

B(z]  — 2zA(z)  + zB(z) ; 


B(z) 


2z 

1 — 4z  — z2  ’ 


and  the  total  number  is  pzn]  (1  + z)/(l  — 42  z2)  — F3n  yi- 

7.43  By  (5.45),  gn  = AnG(0).  The  nth  difference  of  a product  can  be 
written 


AnA(z)B(z)  = £ l (AkEn  kA(z|)(An  kB(z))  , 

k » 

and  En  k = (1  + A)n~k  = (A  k)AF  Therefore  we  find 


This  is  a sum  over  all  trinomial  coefficients;  it  can  be  put  into  the  more 
symmetric  form 


Ft 


n 


L 

j + k+l=n 


fj+k  9k+l  • 


A ANSWERS  TO  EXERCISES  547 


The  empty  set 

is  pointless. 


7.44  Each  partition  into  k nonempty  subsets  can  be  ordered  in  k!  ways,  so 

bk  = k!.  Thus  Q(z)  = Ln,k^o{k}k!z7n!  = ""  7k  = VU-ez). 

And  this  is  the  geometric  series  Hk>c  ekz/2k+1,  hence  Qk  = 1 /2k+1.  Finally, 
Ck  = 2k;  consider  all  permutations  when  the  x’s  are  distinct,  change  each  “>' 
between  subscripts  to  ‘<’  and  allow  each  *<’  between  subscripts  to  become 
either  “<’  or  (For  example,  the  permutation  *i  X3X?  produces  < X3  < X2 
and  X]  = X3  < X2,  because  1 <3  >2.) 

7.45  This  sum  is  ]Tn>i  r(rt)/n2,  where  r(n)  is  the  number  of  ways  to  write 
n as  a product  of  two  relatively  prime  factors.  If  n is  divisible  by  t distinct 
primes,  r(n)  = 2l.  Hence  r(n)/n2  is  multiplicative  and  the  sum  is 


5 

r 


7.46  Let  Sn  = L0^ks;n/2  (n  k2k)  ak.  Then  Sn  = Sn  , + aSn  3 + frt  =0],  and 

the  generating  function  is  1/(1  — Z — az2 ) . When  a = —jj,  the  hint  tells  us 
that  this  has  a nice  factorization  1/(1  + jz)(  1 |z)2.  The  general  expansion 

theorem  now  yields  Sn  = (|rt+c)(|)n  + |(  — j)n,  and  the  remaining  constant  c 
turns  out  to  be 

7.47  The  Stern-Brocot  representation  of  \/3  is  R(LR2)00,  because 


73  + 1 - 2 + 


1 


1 + 


1 1 

73  + 1 


The  fractions  are  f,  5,  , 

pattern 


. . ; they  eventually  have  the  cyclic 


Yzn-l+yzn+l  Il2n+V2n  + 1 Ujn+i  + Vzn  1 V2n+1  +V2n+3 

U2n  V2n+i  U2n+V2n+1  ’ U2n+2 

7.48  We  have  go  = 0,  and  if  gi=  rri  the  generating  function  satisfies 

qG(z)  + bz_1  G(z)  + cz~2(G(z]  — mz)  + ^ = 0 . 

Hence  G(z)  = P(z)/(qz2  + bz  + c)(l  — z)  for  some  polynomial  P(  z ) . Let  pi 

and  p2  be  the  roots  of  cz2  + bz  + a,  with  |pi  ^ | P2 1 ■ If  b2  — 4ac  ^ 0 then 
| pi  2 = pi  P2  = a/c  is  rational,  contradicting  the  fact  that  approaches 
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1 + V2.  Hence  = (-b  + \/  b2  — 4cq)/2c  = 1 + \fi\  and  this  implies  that 
a — — q b — —2c,  p2  = 1 — \/i.  The  generating  function  now  takes  the  form 


G(z) 


- (r  + mjzj 

= ( 1 — 2z  — z2 ) ( 1 -z) 

■ r + (m  + 2r)z 


■ + 


mz+  (2m  - r)z2  + 1 


2(1  - 2z  - z2)  2(1  — z) 

where  r = d/c.  Since  gi  is  an  integer,  r is  an  integer.  We  also  have 


gn  = a(l  + V2)n  +a(1  -^2)n+  \r  = [a(l  + Vl )nJ  , 

and  this  can  hold  only  if  r = -1,  because  (1  — V2)n  alternates  in  sign  as 
it  approaches  zero.  Hence  (a,  b,  c,  d)  = ±(1,2,  —1,1).  Now  we  find  oc  — 
2(1  + V2  m),  which  is  between  0 and  1 only  if  0 <C  m 2.  Each  of 
these  values  actually  gives  a solution;  the  sequences  (qn)are  (0,0, 1,3, 8,.  . .), 

(0, 1 , 3, 8, 20  , . . . ),  and  (0,2,5, 13, 32 

7.49  (a)  The  denominator  of  (1 /(l  — (1  ± Vi)zj  + 1/(1  — (1  — \/2)z))  is 

1 — 2z  z2;  hence  a,,  = 2an_i  + an_2  f°r  n ^ 2.  (b)  True  because  a„  is  even 
and  -1  < 1 — Vi  < O'  ( c ) Let 


bn  = 


P + v/q 


+ 


v - Vq 


We  would  like  bn  to  be  odd  for  all  n > 0,  and  -1  < (p  — y/q)/2  < 0.  Working 
as  in  part  (a),  we  find  bo  = 2,  bi  = p,  and  bn  = pbn  i + l(q  p2)bn.  i for 
n ^ 2.  One  satisfactory  solution  has  p = 3 and  q = 17. 

7.50  Extending  the  multiplication  idea  of  exercise  22,  we  have 

Q =_+qaq  + qDq  * qqoV" 

Replace  each  n-gon  by  zn~  2.  This  substitution  behaves  properly  under  mul- 
tiplication, because  the  pasting  operation  takes  an  m-gon  and  an  n-gon  into 
an  (m  + n 2)-gon.  Thus  the  generating  function  is 


Q = 1 + zQ2  + z2Q3  + z3Q4  + • • • = 1 + 


zQ2 
1 - zQ 


and  the  quadratic  formula  gives  Q = (1  +z—V  1 — 6z  + z2  ) /2z.  The  coefficient 
of  zn  2 in  this  power  series  is  the  number  of  ways  to  put  nonoverlapping 
diagonals  into  a convex  n-gon.  These  coefficients  apparently  have  no  closed 
form  in  terms  of  other  quantities  that  we  have  discussed  in  this  book,  but 
their  asymptotic  behavior  is  known  [173,  exercise  2.2.1-12]. 


Give  me  Legen- 
dre polynomials 
and  I’ll  give  you  a 
closed  form. 
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Incidentally,  if  each  n-gon  in  Q is  replaced  by  wzn_2  we  get 

1 + z — \/ 1 — (4w  + 2)z  + z2 
Q = 2(1  +w)z  ' 

a formula  in  which  the  coefficient  of  wmzn  1 is  the  number  of  ways  to  divide 
an  n-gon  into  m polygons  by  nonintersecting  diagonals. 

7.51  The  key  first  step  is  to  observe  that  the  square  of  the  number  of  ways 
is  the  number  of  cycle  patterns  of  a certain  kind,  generalizing  exercise  27. 
These  can  be  enumerated  by  evaluating  the  determinant  of  a matrix  whose 
eigenvalues  are  not  difficult  to  determine.  When  m = 3 and  n = 4,  the  fact 
that  COS  36°  — <t>/2  is  helpful  (exercise  6.46). 

7.52  The  first  few  cases  are  p0(y)  = 1,  pi(y)  = y,  p2 (y ) = y2  + y, 

p3(y)=  y3  + 3y2  + 3y.Let  p,(y)  = q2n(x)  where  y = x(1  x);  we 

seek  a generating  function  that  defines  q2n+i  (x)  in  a convenient  way.  One 
such  function  is  Zn  cin(x)zn/'n.!  — 2elxz/(elz  + 1),  from  which  it  follows  that 
qn(x)  = inEn  (x),  where  E,(x)  is  called  an  Euler  polynomial.  We  have 
Z(  — l)xxn6x  = j(  — l)x+1  E,(x),  so  Euler  polynomials  are  analogous  to  Ber- 
noulli polynomials,  and  they  have  factors  analogous  to  those  in  (6.98).  By 
exercise  6.23  we  have  nEn_i  (x)  = ZLo  (k)BkX;n  k(2-2k+1);  this  polynomial 
has  integer  coefficients  by  exercise  6.54.  Hence  q2n  (x),  whose  coefficients 
have  denominators  that  are  powers  of  2,  must  have  integer  coefficients.  Hence 
p.(y)  has  integer  coefficients.  Finally,  the  relation  (4y  — l)p"(y)  + 2p(,(y)  — 
2n(2n  1 )pn_i  (y)  shows  that 

2m(2m-1)  n - m m • 1;  n , + 2n(2n  - 1)  ° \ , 

m m + m - 1 

and  it  follows  that  the  |^|’s  are  positive.  (A  similar  proof  shows  that  the 
related  quantity  ( — l)n(2rt  + 2)E2n+l  (x)/(2x  — 1)  has  positive  integer  coeffi- 
cients, when  expressed  as  an  nth  degree  polynomial  in  y.)  It  can  be  shown 
that  I’1]  is  the  Genocchi  number  ( — l)n~'  (22n+1  2)B2n  (see  exercise  6.24), 

and  that  |nn,|=  (£),  |nn2|=  2(n|')  +3(J),  etc. 

7.53  It  is  P(  1 +V4lli,  +V4n . 3 )/6'  Thus>  for  example,  T2C  = P12  = 210:  T285  = 

Pl65  = 40755. 

7.54  Let  Ek  be  the  operation  on  power  series  that  sets  all  coefficients  to  zero 
except  those  of  zn  where  n mod  m = k.  The  stated  construction  is  equivalent 
to  the  operation 


E0  SE0  S (E0  + Ei ) S . . . S (E0  + Eh 1-  Em_i ) 
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applied  to  1/(1  z),  where  S means  “multiply  by  1/(1  — z).”  There  are  m! 

terms 


E0SEk|  SEk,  S . . . SEkm 

where  0 <!  kj  < j,  and  every  such  term  evaluates  to  zrm/(  1 — zm]  if  r is  the 
number  of  places  where  kj  < kj+i  . Exactly  CJ1)  terms  have  a given  value  of  r, 
so  the  coefficient  of  zmn  is  (7)  (n+m~r)  = (n+l)m  by  (6.37).  (The  fact 

that  operation  Ek  can  be  expressed  with  complex  roots  of  unity  seems  to  be 
of  no  help  in  this  problem.) 

7.55  Suppose  that  P0(z)F(z]  + ••■+  Pm(z)F(m) (z)  = Q0(z)G(z)  + ■ • • + 
Qn(z)G|n)(zl  = 0,  where  P,(z)  and  Qn(z)  are  nonzero,  (a)  Let  H(z)  = 
F(z)  + G (z).  Then  there  are  rational  functions  Rky  (z)  for  0 <C  l <m  -)n  such 
that  H|k)(z)  = Rk,0(z)F'°'(z)+  ■•■+  Rk,m-i  (z)F(m-%)  + Rk,m(z)G<°>(z)  + 

■ ■ + Rk,m+n-l(z)  G '^"(z).  The  m + n + 1 vectors  (Rk,o(z), . . . , Rk,m+n_i  (z)) 
are  linearly  dependent  in  the  (m  + n)-dimensional  vector  space  whose  com- 
ponents are  rational  functions;  hence  there  are  rational  functions  Si(z),  not 
all  zero,  such  that  So(z)H(0* (z)  + • • • + Sm+n(z)H(m+n) (z)  = 0.  (b)  Sim- 
ilarly, let  H(z)  = F(z)  G (z).  There  are  rational  Rk,i(z)  for  0 l < mn 
with  H|k|(z)  - YLTSo1  L^To  ^k,ni+i (z) F|l) (z) G(,) (z),  hence  S0(z) H,C| (z)+-  ■ •+ 
Smn(z)H,mn!  (z)  = 0 for  some  rational  Sk ( z) , not  all  zero.  (A  similar  proof 
shows  that  if  and  are  polynomially  recursive,  so  are  (f,  + gn)  and 
(fngn)-  Incidentally,  there  is  no  similar  result  for  quotients;  for  example,  cos  z 
is  differentiably  finite,  but  1 /cos  z is  not.) 

7.56  Euler  showed,  incidentally,  that  this  number  is  also  [zn]  l/'/T— 2z— 3 2?, 
and  he  gave  the  formula  a,  = ^k>c  n—  /k!lZ  He  also  discovered  a “memorable 
failure  of  induction”  while  examining  these  numbers:  Although  3an  Qn+i  is 
equal  to  F„  i (f;n  i + 1)  for  0 n < 9,  this  empirical  law  mysteriously  breaks 
down  when  n is  9 or  more! 

7.57  (Paul  Erdos  currently  offers  $500  for  a solution.) 

8.1  H+T8+4l+4i+?8“*‘27=S-^n  fact’  we  alwaYs  8et  doubles  with 
probability  | when  at  least  one  of  the  dice  is  fair.)  Any  two  faces  whose  sum 
is  7 have  the  same  probability  in  distribution  Pri , so  S = 7 has  the  same 
probability  as  doubles. 

8.2  There  are  12  ways  to  specify  the  top  and  bottom  cards  and  50!  ways  to 
arrange  the  others;  so  the  probability  is  1 2-50! /52!  = 12/(51-52)  — 

8.3  -jE(3+2+«'  -+9+2)  = 4.8;  l(32+22+- . -+92+22-10(4.8)2)  = M ~ 8.6. 
The  true  mean  and  variance  with  a fair  coin  are  6 and  22,  so  Stanford  had 
an  unusually  heads-up  class.  The  corresponding  Princeton  figures  are  6.4  and 
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^ « 12.5.  (This  distribution  has  K4  = 2974,  which  is  rather  large.  Hence  the 
standard  deviation  of  this  variance  estimate  when  n = 10  is  also  rather  large, 
\J 2974/10+  2(22 y2/9  ~ 20.1  according  to  exercise  54.  One  cannot  complain 
that  the  students  cheated.) 

a. 4 This  follows  from  (8.38)  and  (8.39),  because  p(z)  = G(z)H(z).  (A 
similar  formula  holds  for  all  the  cumulants,  even  though  F(z)  and  G(z)  may 
have  negative  coefficients.) 

8.5  Replace  H by  p and  T by  q = 1 — p.  If  Sa  = Sg  = \ we  have  p2qN  = \ 
and  pq2N  = lq  + 1;  the  solution  is  p = 1 /cjj2,  q = 1 /cj). 

8. 6 In  this  case  X|rj  has  the  same  distribution  as  X,  for  all  y,  hence 
E(X|Y)  = EX  is  constant  and  V(E(X|Y))  = 0.  Also  V(X|Y)  is  constant  and 
equal  to  its  expected  value. 

8.7  We  have  1 = (pi+p2H l~P6)2  hpg) by  Chebyshev’s 

summation  inequality  of  Chapter  2. 

8.8  Let  p = Pr(cusAnB),  q = Pr(cu^A),  and  r = Pr(tu^B).  Then 
p + q + r = l,andtheidentitytobeprovedisp  = (p  + r)(p  + q)-qr. 

8.9  This  is  true  (subject  to  the  obvious  proviso  that  E and  G are  defined 
on  the  respective  ranges  of  X and  Y),  because 

Pr(F(X)  = f and  G(Y)  = g)  = Pr(X  = x and  Y = y) 

xer'(f) 
yeG  '(g) 

= Y_  Pr(X  = x)  • Pr(Y  = y) 

x€F-'  (f) 
ySG-1  (g) 

= Pr(F(X)  =f)  . Pr(G(y)  = g)  . 

8.10  Two.  Let  xi  < Xi  be  medians;  then  1 <C  Pr(X^xi)  + Pr(X^X2)  ^ 

1 , hence  equality  holds.  (Some  discrete  distributions  have  no  median  ele- 
ments. Lor  example,  let  Q be  the  set  of  all  fractions  of  the  form  ±1  /tv,  with 
Pr(+l/n)  = Pr(-l/n)  = f^n-2.) 

8.11  Lor  example,  let  K = k with  probability  4/ ( k + 1 ) (k  + 2)  (k  + 3) , for  all 
integers  k ^ 0.  Then  EK  = 1,  but  E(K2)  = 00.  (Similarly  we  can  construct 
random  variables  with  finite  cumulants  through  Km  but  with  Km+i  = co.) 

8.12  (a)  Let  pk  = Pr(X  = k).  If  0 < x <:  1 , we  have  Pr(X  <;  r)  = jTk<r  pk  ^ 
Lk^x^'Pk  C xk~rPk.  = x rP(x).  The  other  inequality  has  a similar 
proof,  (b)  Let  x = a/(1  — a)  to  minimize  the  right-hand  side.  (A  more 
precise  estimate  for  the  given  sum  is  obtained  in  exercise  9.42.) 
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8.13  (Solution  by  Boris  Pittel.)  Let  us  set  Y = (Xi  + . . . + X,)/n  and 
Z = (Xn+i  + . + X2n)/n.  Then 


Pr 


Y+Z 


2 

^ Pr 


— a 


^ Y-a 


Y-a 


+ 


Z — a 


^ |y  - 


a 


Pr  ( Z - a'  $ |Y  — a|)  £ 


The  last  inequality  is,  in  fact,  *>’  in  any  discrete  probability  distribution, 
because  Pr(Y  = Z)  >0. 

8.14  Mean(H)  = pMean(F)+  qMean(G);  Var(H)=  pVar(F)  + qVar(G)  + 
pq(Mean(F)-Mean(G))‘.  (A  mixture  is  actually  a special  case  of  conditional 
probabilities:  Let  Y be  the  coin,  let  X|H  be  generated  by  F(z),  and  let  X T 
be  generated  by  G(z).  Then  VX  = EV(X|Y)  + VE(X|Y),  where  EV(X|Y)  = 
pV(X|H)  + qV(X|T)  and  VE(X|Y)  is  the  variance  of  pZMean(F>  -)-  qzMean(G).) 

8.15  By  the  chain  rule,  H’(z)  = G'(z)F'(G(z));  H"(z)  = G"(z)F'(G(z))  + 
G'(z)2F"(G(z)).  Hence 


Mean(H)  = Mean(F)  Mean(G)  ; 

Var(H)  = Var(F)  Mean(G)’  + Mean(F)  Var(G) 


(The  random  variable  corresponding  to  probability  distribution  H can  be  un- 
derstood as  follows:  Determine  a nonnegative  integer  n by  distribution  F; 
then  add  the  values  of  n independent  random  variables  that  have  distribu- 
tion G.  The  identity  for  variance  in  this  exercise  is  a special  case  of  (8.105), 
when  X has  distribution  H and  Y has  distribution  F.) 

8.16  ew(z  'V(1  -w). 

8.17  Pr(Ynp  <(  mj  = Pr(Yn  p + n $ m + n)  = probability  that  we  need  ^ 
m + n tosses  to  obtain  n heads  = probability  that  m + n tosses  yield  n 
heads  = Pr(Xm+UiP  3 n).  Thus 


u + k- 

k 


pV  = 


I 

kj:n 


m + nX 
k 


.pkqm+n 


/ 


k 


rn+n-k  k . 

r M ) 


and  this  is  (5.19)  with  n = r,  x = q,  y = p. 

8.18  (a)  Gx(z)  = ' ! . (b)  The  mth  cumulant  is  p.,  for  all  rri  ^ 1 . (The 

case  p = 1 is  called  F^  in  (8.55).) 
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8.19  (a)  Gx,+x,(z)  = Gx,  (z)Gx2(z)  = e1*21^2’12  Hence  the  probability 
is  gM-1  +M-2  (|x,  + (j.2 ) n/rL! ; the  sum  of  independent  Poisson  variables  is  Poisson, 
(b)  In  general,  if  KmX  denotes  the  mth  cumulant  of  a random  variable  X,  we 
have  Km(aXi+  bX2.l=  am(KmXi)  + bm(KmX2),  when  a,  b }>  0.  Hence  the 
answer  is  2mpi  + 3mp.2- 

8.20  The  general  pgf  will  be  G(z)  = zm/F(z),  where 

m 

F(z)  = zm~(l  -z)^A(kl[A'k'=A(k)]zm-k, 

k=1 

m 

F'(l)  = m-  y A(k)[A|k>  — A(k)]  , 

k=l 

m 

F”(l)  = m(m-  1)  — 2^T  (m-  k)A(k)[A|k)  = A(k)] . 

k=1 

8.21  This  is  Y.n>0  c'rl’  w^ere  On  is  the  probability  that  the  game  between 
Alice  and  Bill  is  still  incomplete  after  n flips.  Let  pn  be  the  probability  that 
the  game  ends  at  the  nth  flip;  then  pn  + qn  = Qn-i'  Hence  the  average  time 

to  play  the  game  is  npn  = (q0-qi)  +2(q,  -q2)+3(q2  — q3)  H 

q0  + qi  + q2  + • • = N,  since  lilting  nq,  = 0. 

Another  way  to  establish  this  answer  is  to  replace  H and  T by  jz. 
Then  the  derivative  of  the  first  equation  in  (8.78)  tells  us  that  N (1)  + N’(  1)  = 

N'm  + s;m  + s'(i). 

By  the  way,  N = 

8.22  By  definition  we  have  V(X|Y)=  H(X2 1 Y]  (E(X| Y))2  and  V(E(x| Y))  = 

F_((E(X|Y))2)  (E(E(X|Y)))2;  hence  E(V(X|Y))  + V(E(X|Y))  - E(E(X2|Y))  - 

(E(E(XjY)))2.  But  E(E(X|Y))  = EX  and  E(E(X2|Y))  = E(X2),  so  the  result  is 
just  VX. 

8.23  LetOo^a  ,0  }2  and  £1,  = { [7{],  U I . LI  . [jjj]  }2 ; and  let  Q2  be  the 

other  16  elements  of  Cl.  Then  Pm  (w)  — Proo  ( to ) = , ^2,  A2  according 

as  a)  £ Oo,  O],  Q2.  The  events  A must  therefore  be  chosen  with  kj  elements 

from  Qj,  where  kc.  k- , kyis  one  of  the  following:  (0,0,0),  (0,2,7),  (0,4,14), 
(1,4,4),  (1,6,11),  (2,6,1),  (2,8,8),  (3,8,15),  (3,10,5),  (3,12,12),  (4,12,2), 
(4,14,9),  (4,16,16).  Forexample,  there  are  eventsoftype  (2,6,1). 

The  total  number  of  such  events  is  [z°]  (1  + z2®)4  (1  + z 7)16(  1 + z2)16,  which 
turns  out  to  be  1304927002.  If  we  restrict  ourselves  to  events  that  depend 
on  S only,  we  get  40  solutions  S 0 A,  where  A = 0,  { 2, , 40  , g },  ( 2, , 5,9}, 
{2,12,  40,  g,  5,9},  {2, 4,6,8,  10, 1 2},  { 3,  ,7,  1 , 4,  10},  and  the  complements  of 
these  sets.  (Here  the  notation  ‘ 2,  ’ means  either  2 or  12  but  not  both.) 
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8.24  (a)  Any  one  of  the  dice  ends  up  in  J’s  possession  with  probability 
p = 4 + (§)2p>,;  hence  p = yy.  Let  q = ff-  Then  the  pgf  for  J’s  total  holdings 
is  (q  + pz)2n+1,  .with  mean  (2n  + 1 )p  and  variance  (2n  + 1 )pq,  by  (8.61). 

(b)©pV+  a)p4q+§P5  = f,l«.585. 

8.25  The  pgf  for  the  current  stake  after  n rolls  is  G,(z),  where 


G0(z)  - zA ; 

G,  ( z ) = Gn  1 (z2|k ,/5)/6  , for  n > 0. 

(The  noninteger  exponents  cause  no  trouble.)  It  follows  that  Mean(Gn)  = 
Mean(Gn  i ),  and  VartGn)  + Mean(Gn)2  = f§(Var(Gn  i)  + MeanfG,,  , )2). 
So  the  mean  is  always  A,  but  the  variance  grows  to  ((y|)n  1)A2. 

8.26  The  pgf  Fl  n(z)  satisfies  F , 1 ,,,( z ) - t hence  Mean(Fi,n ) = 
F(  n (1)  = [n  ;>  l]/l  and  F"n  (1)  = [rt  ^ 21] /l2;  the  variance  is  easily  computed. 
(In  fact,  we  have 


This  problem  can 
perhaps  be  solved 
more  easily  without 
generating  functions 
than  with  them. 


Fl,n(z)  - X. 

0^k$n/l 


which  approaches  a Poisson  distribution  with  mean  1 /l  a s n — > co.) 

8.27  (ti2L3  37x121!+  2I3)/n(tx  — 1 ) (n  — 2)  has  the  desired  mean,  where 
Ik  = Xk  + + Xk.  This  follows  from  the  identities 

EIj  = tx|X3; 

E(I2It  ] = rxp3  +n(rx-  1 ; 

E(l|)  = np3  + 3rt(n  — 1 )p2pi  +7x(n  1 ) ( n — 2)p.2. 

Incidentally,  the  third  cumulant  is  k3  = E(  (X~EX)3),  but  the  fourth  cumulant 
does  not  have  such  a simple  expression;  we  have  K4  = E ( ( X — EX)4)  3(  VX)2. 

8.28  (The  exercise  implicitly  calls  for  p = q — j,  but  the  general  answer  is 
given  here  for  completeness.)  Replace  H by  pz  and  T by  qz,  getting  Sa(z)  — 
p2qz3/(l  — pz)(1  - qz.) ( 1 — pqz2) and Sb(z)  = pq2z3/(l  - qz)(l  -pqz2).  The 
pgf  for  the  conditional  probability  that  Alice  wins  at  the  nth  flip,  given  that 
she  wins  the  game,  is 

Sa(z)  _ 3 g p 1 -pq 

Sa(1)  1 -pz  1 - qz  1 -pqz2 

This  is  aproduct  of  pseudo-pgf’s,  whose  mean  is3  + p/q  + q/p  + 2pq/(  1 - pq)  . 
The  formulas  for  Bill  are  the  same  but  without  the  factor  q/(  1 — pz),  so  Bill’s 
mean  is  3 + q / p + 2pq/(l  - pq).  When  p =q  = 4 ) the  answer  in  case  (a)  is 
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y ; in  case  (b)  it  is  y.  Bill  wins  only  half  as  often,  but  when  he  does  win  he 
tends  to  win  sooner.  The  overall  average  number  of  flips  is|y+|-y=y) 
agreeing  with  exercise  21.  The  solitaire  game  for  each  pattern  has  a waiting 
time  of  8. 

8.29  Set  H = T = / in 

1 + N(H  + T)  = N + Sa  + Sr  + Sc 

NHHTH  = SA(1  +HTH)  + Sb(HTH  + TH+  1)  +SC(HTH  + TH) 

N HTHH  = Sa(THH  + H)  + Sb(THH  + 1)  + Sc(THH) 

N THHH  = Sa(HH)+  Sb(HH)+  Sc 

to  get  the  winning  probabilities.  In  general  we  will  have  SA  + SB  + Sc  = 1 
and 

Sa(A:A]  + Sb(B:A)+  Sc(C:A)  = Sa(A:B)  + Sb(B:B)  + Sc(C:B) 

- Sa(A:B)  + Sb(B:C)  + Sc(C:C), 

In  particular,  the  equations  9SA+3SB+3Sc  =5SA+9SB+Sc  = 2SA+4Sg+9Sc 
imply  that  SA  = 57 , SB  = Sc  = . 

8.30  The  variance  of  P(hi , . . . , h.n;  k)  k is  the  variance  of  the  shifted  bino- 
mial distribution  ((m  - 1 + z)/m)  k 1 z,  which  is  (k—  l)(y)(l  ) by  (8.61). 

Hence  the  average  of  the  variance  is  Mean(S)(m  — 1)/m2.  The  variance  of 
the  average  is  the  variance  of  (k  — 1 )/m,  namely  Var(  S)/m2.  According  to 
(8.105),  the  sum  of  these  two  quantities  should  be  VP,  and  it  is.  Indeed,  we 
have  just  replayed  the  derivation  of  (8.95)  in  slight  disguise.  (See  exercise  15.) 

8.31  (a)  A brute  force  solution  would  set  up  five  equations  in  five  unknowns: 
A = 1ZB  + jzE;  B = IZC;  C = 1 + |zB  + |zD;  D = IZC  + ±zE;  E = 1ZD. 
But  positions  C and  D are  equidistant  from  the  goal,  as  are  B and  E,  so  we 
can  lump  them  together.  If  X = B + E and  Y = C + D,  there  are  now  three 
equations: 

A = izX;X  = ±zY;  Y = 1 + IzX+±zY. 

Hence  A = z2/ (4  2z  z2);  we  have  Mean(A)  = 6 and  Var(A)  = 22.  (Rings 
a bell?  In  fact,  this  problem  is  equivalent  to  flipping  a fair  coin  until  get- 
ting heads  twice  in  a row:  Heads  means  “advance  toward  the  apple”  and 
tails  means  “go  back.“)  (b)  Chebyshev’s  inequality  says  that  Pr(S  i>  100)  = 
Pr(  (S  6)2  942)  <C  22/942  ~ .0025.  (c)  The  second  tail  equality  says  that 

Pr(  S ^ 100)  1 /x98  (4  — 2x  — x2 ) for  all  x ^ 1 , and  we  get  the  upper  bound 

0.00000005  when  x = (\/49001  — 99)/ 1 00.  (The  actual  probability  is  approx- 
imately 0.0000000009,  according  to  exercise  37.) 
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8.32  By  symmetry,  we  can  reduce  each  month’s  situation  to  one  of  four 
possibilities: 

D,  the  states  are  diagonally  opposite; 

A,  the  states  are  adjacent  and  not  Kansas; 

K,  the  states  are  Kansas  and  one  other; 

S,  the  states  are  the  same. 


“Toto,  I have  a 
feeling  we’re  not  in 
Kansas  anymore.  ” 

-Dorothy 


Considering  the  Markovian  transitions,  we  get  four  equations 


D = 1 +z(|D  + ^K) 

A = z[  | A + ^K) 

K = zffD+'fA  + ^K) 
S = z(|D+  ±A+  -fjK) 


whose  sum  isD  + K + A + S = 1 + z(D  + A + K).  Thesolutionis 

c 81  z - 45z2  ■ ■ 423 

S “ 243-243z  + 24z2+8z5  ’ 

but  the  simplest  way  to  find  the  mean  and  variance  may  be  to  write  z = 1 + w 
and  expand  in  powers  of  w,  ignoring  multiples  of  w2; 


n 27  i 1593...  [ 

D - + =5iTw  + ■ " > 

A — I + - 

K = T + Ww+”-' 


NOW  $'(1)  = yg  + § 


+ ¥=frand*S 


mean  is  and  the  variance  is 


]05 
4 1 


til  - 1593  i 21_I5  j.  2661 
1 1 ' _ 512  T 256  256 

(Is  there  a simpler  way?) 


11145 

512 


The 


8.33  First  answer:  Clearly  yes,  because  the  hash  values  h-i , . . . , h.n  are 
independent.  Second  answer:  Certainly  no,  even  though  the  hash  values  h.; , 
• • • i h-u  Areknriejjeiident.  We  have  Pr(Xj  = 0)  = H£=1  Sk([j  ^k](m— 1 )/m)  = 
(1  - Sj)(m-  1 )/m,  but  Pr(X;  =X2  = 0)  = Sk[k>2](m- 1 )2/m2  = (1 

s,  s2)(m-  l)2/m2  7 Pr(X,=0)  Pr(X2  =0). 


8.34  Let  [z”j  S,(z)  be  the  probability  that  Gina  has  advanced  < m steps 
after  taking  n turns.  Then  Sm(  1)  is  her  average  score  on  a par-rn  hole; 
[zm]  S,(z)  is  the  probability  that  she  loses  such  a hole  against  a steady  player; 
and  1 ™ [zm  ] S,(z)  is  the  probability  that  she  wins  it.  We  have  the  recurrence 


So  (2:]  - 0 ; 

Sm(z)  = (1  + pzSm_.2(z)  + qzSm_i  (z))/(l  rz), 


for  m > 0. 
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To  solve  part  (a),  it  suffices  to  compute  the  coefficients  for  m,n  $ 4;  it  is 
convenient  to  replace  z by  100w  so  that  the  computations  involve  nothing 
but  integers.  We  obtain  the  following  tableau  of  coefficients: 


So 

0 

0 

0 

0 

0 

S, 

1 

4 

16 

64 

256 

s2 

1 

95 

744 

443  2 

23  5 5 2 

S3 

1 

100 

906  5 

104044 

819808 

s4 

1 

100 

997  5 

868  5 35 

12964304 

Therefore  Gina  wins  with  probability  1 ,868535  = .131465;  she  loses  with 

probability  ,12964304.  (b)  To  find  the  mean  number  of  strokes,  we  compute 


s.d)  = §;  s2(i) 


4675  . 
2304  ’ 


s3(i) 


667825  . 
221184  ’ 


S4d) 


85134475 
21233664  ' 


(Incidentally,  S5  ( 1)  ~ 4.9995;  she  wins  with  respect  to  both  holes  and  strokes 
on  a par-5  hole,  but  loses  either  way  when  par  is  3.) 

8.35  The  condition  will  be  true  for  all  n if  and  only  if  it  is  true  for  n = 1, 
by  the  Chinese  remainder  theorem.  One  necessary  and  sufficient  condition  is 
the  polynomial  identity 


(P2+P4+P6  + (Pl+P3+P5)w)  (P3+P6  + (pi+p4)z+  (p2+P 5)Z2) 
= (PlWZ  + p2z2  + p3w  + P4Z  + P5WZ2  +p6) , 


but  that  just  more-or-less  restates  the  problem.  A simpler  characterization  is 


(P2  +P4  +P6)(P3  +Pb)  = P6,  (Pi  +P3 +P5)(P2 +P5)  = P5, 

which  checks  only  two  of  the  coefficients  in  the  former  product.  The  general 
solution  has  three  degrees  of  freedom:  Let  qq  -4-  cp  = bo  + b] + b2  = 1 , and 
put  pi  = a i b i , p2  = a0b2,  p3  = aib0,  p4  = a0bi,  P5  = ai b2,  Pb  = a0b0. 

8.36  (a)  □ □□□□□  . (b)  If  the  kth  die  has  faces  with  s ) , 

. . . , Sg  spots,  let  Pk(z)  = zS|  +• . • -f-zSt’ . We  want  to  find  such  polynomials  with 
pi  (z) . . .Pn(z)  = (z  + z2  + Z3  + Z4  + z5  + z6)n.  The  irreducible  factors  of  this 
polynomial  with  rational  coefficients  are  zn(z  + 1 )n(z2  + z + 1 )n \j}  — z + 1 )n; 
hence  pk(z)  must  be  of  the  form  zak  (z  + 1 )bk  (z2  + z -I-  1 )Ck  (z2  — z + 1 )dk . We 
must  have  ak  ^ 1 , since  pk(0)  = 0;  and  in  fact  ak  = 1 , since  Q]  + a,,  = n. 
Furthermore  the  condition  pk(  1)  = 6 implies  that  bk  = Ck  = 1.  It  is  now  easy 
to  see  that  0 ^ dk  Sj  2,  since  dk  > 2 gives  negative  coefficients.  When  d — 0 
and  d = 2,  we  get  the  two  dice  in  part  (a);  therefore  the  only  solutions  have 
k pairs  of  dice  as  in  (a),  plus  n 2k  ordinary  dice,  for  some  k 
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8.37  The  number  of  coin-toss  sequences  of  length  n is  Fn_i  , for  all  n > 0, 
because  of  the  relation  between  domino  tilings  and  coin  flips.  Therefore  the 
probability  that  exactly  n tosses  are  needed  is  Fn_i  /2n,  when  the  coin  is  fair. 
Also  qn  = FTV+i/271"1,  since  Fnz"  = (Fnz-  + Fn_lZn+1)/(l  z z2). 
(A  systematic  solution  via  generating  functions  is,  of  course,  also  possible.) 

8.38  When  k faces  have  been  seen,  the  task  of  rolling  a new  one  is  equivalent 

to  flipping  coins  with  success  probability  pk  = (m  — k)/m.  Hence  the  pgf  is 
llk=oPkZ/(l  - qkz)  = “ k)z/(m  - kz).  The  mean  is  ^'oPr1  = 

m(Hm  ~ Hm_0;  the  variance  is  m2(Hm  - m(Hm  and 

equation  (7.47)  provides  a closed  form  for  the  requested  probability,  namely 
m_nm!{^~11  }/(m-L)!.  (The  problem  discussed  in  this  exercise  is  traditionally 
called  “coupon  collecting!4) 

8.39  E(X)  = P(-1);  V(X)  = P(-2)  - P(-l)2;  E(lnX)  = -P’(O). 

8.40  (a)  We  have  Km  = n(0!{7}p  - 1 !{^}p2  + 2!{™}p3 ),  by  (7.49), 

Incidentally,  the  third  cumulant  is  npq(q-p)  and  the  fourth  is  npq(l-6pq). 
Theidentity  q+pe*  = (p  + qe^je*  shows  that  f„,(p)  = (-1  )mfm(q)  + [m  = 1]; 
hence  we  can  write  f , , , ( p)  = gm(pq)(q  — p)[m  oc^,  where  gm  is  a polynomial 
of  degree  [m/2J,  whenever  m > 1.  (b)  Let  p = i and  F(t)  — ln(  1 + lel). 
Then  Kmtm~V(m-1j!  = F'(t)  = 1 — 1 /(e1  + 1),  and  we  can  use  exercise 
6.23. 

8.41  If  G(z)  is  the  pgf  for  a random  variable  X that  assumes  only  positive 
integer  values,  then  G(z)  dz/z  = ^k>]  Pr(X=k)/k  = F(X  ’ ) . If  X is  the 
distribution  of  the  number  of  flips  to  obtain  n + 1 heads,  we  have  G(z)  = 
(pz/(1  ™qz))n+  by  (8.59),  and  the  integral  is 

/ pz  \n_1  dz  0 wn  dw 

.0  V1  - dz/  z Jo  1 + (q/p)w 

if  we  substitute  w = pz/(  1 qz ) . When  p = q the  integrand  can  be  written 

(-1  )n((l  +w) -1  +w-w2H F(— 1 )nwn~’),  so  the  integral  is  (-1  )n(ln2- 

1 + 2 - 5 + ■ • ' + (-1  )7n).  We  have  H2n  -Hn  = ln2- }n~'  + ^W2  + 0(0 
by  (9.28),  and  it  follows  that  E(Xn}, ) = 1 |rT2  + 0(rT4). 

8.42  Let  F,(z)  and  G,(z)  be  pgf  s for  the  number  of  employed  evenings,  if 
the  man  is  initially  unemployed  or  employed,  respectively.  Let  qh  = 1 — ph 
andqf:=l  — pf.  Then  F0(z)  = Go(z)  = 1,  and 

Fn(z)  = Ph^Gn_i  (z)  + qhFn-  1 [z]  ; 

Gn(z)  — PfFn-i  {z)  + qfzGn_i  {z) . 
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The  solution  is  given  by  the  super  generating  function 

G(w, z)  = ^ Gn(z)wn  = A(w)/(1  - zB(w))  , 
u^O 

where  B(w)  = w(qf  — ( qf  — Ph)w)/(1  — qh>v)  and  A(w)  = (1  — B (w)) /( 1 — w). 
Now  Zn^o^n(1)wn=  «w/(l  -w)2+  (3/(1  -w)  - (3/(1  - (qf -Ph)w)  where 

Ph  R _ Pf(qf  ~Ph)  . 

“ ~ Ph  + Pf  ’ “ (Ph+Pf)2  ’ 

hence  G^(1)  = an  + (3(1  - (qf— Ph)n).  (Similarly  G"(1)  = a2n2  + O(n),  so 
the  variance  is  O(n).) 

8.4  3 Gn(z)  = Xk>0  [k]zk/n'  = Zn/n!,  by  (6.11).  This  is  a product  of 
binomialpgf’ s,  ]""[£_  i ((k-1  + z)/k),  where  the  kth  has  mean  1 /k  and  variance 
(k-  l)/k2;  hence  Mean(Gn)=  Hn  and  Var(Gn)=  Hn  — Hn2). 

8.44  (a)  The  champion  must  be  undefeated  in  n rounds,  so  the  answer  is  pn. 
(b,c)  Players  Xp  • ■ . , X2k.  must  be  “seeded”  (by  chance)  in  distinct  subtour- 
naments and  they  must  win  all  2k(n  — k)  of  their  matches.  The  2U  leaves  of 
the  tournament  tree  can  be  filled  in  2n!  ways;  to  seed  it  we  have  2k!(2n~k)2 
ways  to  place  the  top  2k  players,  and  (2n  — 2k)!  ways  to  place  the  others. 
Hence  the  probability  is  (2p)2  *u~k*/(2k).  When  k = 1 this  simplifies  to 
(2p2ry(2n  - 1).  (d)  Each  tournament  outcome  corresponds  to  a permuta- 
tion of  the  players:  Let  yi  be  the  champ;  let  y2  be  the  other  finalist;  let  y3  and 
y4  be  the  players  who  lost  to  yi  and  y2  in  the  semifinals;  let  (y^,  ■ ■ • , TJ g ) be 
those  who  lost  respectively  to  (y  f , . . . , y 4 ) in  the  quarterfinals;  etc.  (Another 
proof  shows  that  the  first  round  has  2 n!/2n~1!  essentially  different  outcomes; 
the  second  round  has  2n_1!/2n_2!;  and  so  on.)  (e)  Let  Sk  be  the  set  of  2k_1 
potential  opponents  of  xj  in  the  kth  round.  The  conditional  probability  that 
X2  wins,  given  that  xi  belongs  to  Sk,  is 

Pr(xi  plays  X2)  'Pn_1  (1  — p)  + Pr(x;  doesn’t  play  X2)-pn 

= pk_1pn_1  (1  -p)  + (1  -p^p*. 

The  chance  that  X;  G Sk  is  2k~V(2n  1);  summing  on  k gives  the  answer: 

L 4^(pk‘V_1(i-p) + o-pk~V)  = Pn  - (22pnr~-pR~1 . 

k=1  Z 1 

(f)  Each  of  the  2n!  tournament  outcomes  has  a certain  probability  of  occur- 
ring, and  the  probability  that  Xj  wins  is  the  sum  of  these  probabilities  over 
all  (2”  1 )!  tournament  outcomes  in  which  Xj  is  champion.  Consider  inter- 

changing Xj  with  Xj+i  in  all  those  outcomes;  this  change  doesn’t  affect  the 
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probability  if  Xj  and  Xj+i  never  meet,  but  it  multiplies  the  probability  by 
(1  — p ) /p  < 1 if  they  do  meet. 

8.45  (a)  A(z)  = 1/(3  22);  B(z)  = zA(z)2;  C(t)  = z2A(z)3.  The  pgf  for 

sherry  when  it’s  bottled  is  z3A(z)3,  which  is  7}  times  a negative  binomial 
distribution  with  parameters  rt  = 3,  p = f.  (b)  Mean(A)  = 2,  Var(A)  = 6; 
Mean(B)  = 5,  Var(B)  = 2Var(A)  = 12;  Mean(C)  = 8,  Var(C)  =18.  The 
sherry  is  nine  years  old,  on  the  average.  The  fraction  that’s  25  years  old  is 
( 22 ) ( — 2)223  25  = (f4)2223  ‘5  = 2 3 . (|)24  00137.  (c)  Let  the  coefficient  of 

Wn  be  the  pgf  for  the  beginning  of  year  n.  Then 


A = ( 1 + fw/(1  -w))/(1  - fzw); 

B = (1+  fzwA)/(1-  fzw); 

C = (1  + fzwB)/(1  ■ fzw) . 

Differentiate  with  respect  to  z and  set  z = 1;  this  makes 

c,  _ 8 1/2  3/2  6 

1 -W  (1  — fw)3  (1  - fw)2  1 — fw  ' 

The  average  age  of  bottled  sherry  n years  after  the  process  started  is  1 greater 
than  the  coefficient  of  wn~\ namely  9 — ( f )n(3n2+21n  + 72)/8.  (This  already 
exceeds  8 when  n = 1 1 . ) 

8,46  (a)  P(w,z)  = 1 + 4(wP(w,z)  + zP(w,z))  = (1  — f(w  + z))_1,  hence 

Pmn  = 2~m  n(mfn)- (b)  Pk(w,z)  = \ (wk  + zk)P(w,  z);  hence 


Pk. 


m,n 


m + n 
m 


m + n — k 
n 


(c)  Ik  KPk,n,n  = lk=0  k2k_2n(2nT/lc)  = Ik=0(n  - k)2-  n k(^k);  this  can  be 
summed  using  (5.20): 


(The  methods  of  Chapter  9 show  that  this  is  2\Jn/n  1 + 0(rt  122).) 

8.47  After  n irradiations  there  are  n + 2 equally  likely  receptors.  Let  the 
random  variable  Xn  denote  the  number  of  diphages  present;  then  Xn+;  = 
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Xn  + Y„  where  Yn  = -1  if  the  (n  + 1 )st  particle  hits  a diphage  receptor 
(conditional  probability  2Xn/(rt  + 2))  and  Yn  = +2  otherwise.  Hence 

EXn+1  - EX„  + EYn  = EX,  - 2EXn/(n+2)  + 2(1  - 2EXn/(n+2))  . 

The  recurrence  (rt+2)EXn+i  = (u— 4)EXn+2n+4  can  be  solved  if  we  multiply 
both  sides  by  the  summation  factor  (n  + 1 )+  or  we  can  guess  the  answer  and 
prove  it  by  induction:  EX,  = (2n  + 4)/7  for  all  n > 4.  (Incidentally,  there 
are  always  two  diphages  and  one  triphage  after  five  steps,  regardless  of  the 
configuration  after  four.) 

8.48  (a)  The  distance  between  frisbees  (measured  so  as  to  make  it  an  even 
number)  is  either  0,  2,  or  4 units,  initially  4.  The  corresponding  generating 
functions  A,  B,  C (where,  say,  [zn]  C is  the  probability  of  distance  4 after  n 
throws)  satisfy 

A = |zB , B = IzB  + izC,  C = 1 + izB+|zC 

It  follows  that  A = z‘/( ) 6 202  + 5z2)  = Z2/F(z),  and  we  have  Mean(A)  = 

2 Mean(F)  = 12,  Var(A)  = Var(  F)  = 100.  (A  more  difficult  but  more 
amusing  solution  factors  A as  follows: 

A = Pi  Z _ Viz  _ P2  Piz  PI  P2Z 

1 - q,z  1 - q2z  _ p2  -pi  1 - qiz  + P,  - p2  1 - q2z  ’ 

where  pt  = 4)2/4  = (3  + \/5  )/8,  p2  = $2/4  = (3  - \/5  )/8,  and  pi  + q,  = 

P2  +02  = 1 . Thus,  the  game  is  equivalent  to  having  two  biased  coins  whose 
heads  probabilities  are  pi  and  P2l  flip  the  coins  one  at  a time  until  they 
have  both  come  up  heads,  and  the  total  number  of  flips  will  have  the  same 
distribution  as  the  number  of  frisbee  throws.  The  mean  and  variance  of  the 
waiting  times  for  these  two  coins  are  respectively  6 T 2\/5  and  50  ^ 22\f5, 
hence  the  total  mean  and  variance  are  12  and  100  as  before.) 

(b)  Expanding  the  generating  function  in  partial  fractions  makes  it 
possible  to  sum  the  probabilities.  (Note  that  \/5/(44>)  + (J)2/4  = 1,  so  the 
answer  can  be  stated  in  terms  of  powers  of  4>.)  The  game  will  last  more  than 
n steps  with  probability  5(^—1 ! / 2 ^ — n ( ^n+2  _ ^-n-2).  when  n even  this  is 
511/24  nfn+2,  So  the  answer  is  5504  100Fio2  ~ .00006. 

8.49  (a)  If  n >0,  PN(0,n)=  1[N=0]  + ^ PN  1 (0, n)  + lpN_, (1  ,n-l ); 
Pn  (m,  0)  is  similar;  (0,0)  = [N  = 0],  Hence 


9m, n = 4zgm  l^n  + 1 + 2z9m,n  + 4^9m+l,n  1 1 
9o,n  = \ + |zg0,n  + 4 9 1 ,n- 1 j etc. 


9m, n — 1 + 4 9m  1 ,n  + l 'F  2 9m, n 
By  induction  on  m,  we  have 


49m4l,n--1>  9o,n  — 
(2m  +1 ) 9o,m+n 


2 + l9o,n  + 59l,n  i!  etc- 
2m2  for  all  m,  n ^ 0. 
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And  since  c = g?  m,  we  must  have  g^  n - m+n+2mn.  (c)  The  recurrence 
is  satisfied  when  mn  > 0,  because 


sin(2m  +1)0  = 


1 / sin(2m  — 1 )0 

cos2  0 V 4 


sin(2m+  1)0  ; sin(2m  + 3)( 
2 + 4 


this  is  a consequence  of  the  identity  sin(x  — y)  + sin(x  + y)  = 2sin  x COSIJ.  So 
all  that  remains  is  to  check  the  boundary  conditions. 

8.50  (a)  Using  the  hint,  we  get 


now  look  at  the  coefficient  of  z3+l.  (b)  H(z)  = | + j^z  + 1 Huo  c3+lz2+1- 

(c)  Let  r = y[l  -2)(9-z).  0 ne  can  show  that  (z  — 3 + r)(z  — 3 — r)  = 4z, 
and  hence  that  (r/(1  -z)+2)2  = (13-5z+4r)/(1  -z)  - (9-H(z))/(1  -H (z)). 

(d)  Evaluating  the  first  derivative  at  z = 1 shows  that  Mean(H)  = 1 . The 
second  derivative  diverges  at  z = 1 , so  the  variance  is  infinite. 

8.51  (a)  Let  H,(z)  be  the  pgf  for  your  holdings  after  n rounds  of  play,  with 

Hq(z)  = z.  The  distribution  for  n rounds  is 


Hn+1(z)  = Hn(H(z);  , 

so  the  result  is  true  by  induction  (using  the  amazing  identity  of  the  preceding 
problem),  (b)  gn  = Hn(0)  Hn  i (0)  = 4/n(rt  + 1 )(n  + 2)  = 4(n-1)zA.  The 
mean  is  2,  and  the  variance  is  infinite,  (c)  The  expected  number  of  tickets  you 
buy  on  the  nth  round  is  Mean(Hn)=  1,  by  exercise  15.  So  the  total  expected 
number  of  tickets  is  infinite.  (Thus,  you  almost  surely  lose  eventually,  and  you 
expect  to  lose  after  the  second  game,  yet  you  also  expect  to  buy  an  infinite 
number  of  tickets.)  (d)  Now  the  pgf  after  n games  is  Hn(z)2,  and  the  method 
of  part  (b)  yields  a mean  of  16  — 2.8.  (The  sum  ^Ik>1  1 /k2  = 7T2/6 

shows  up  here.) 

8.52  If  uj  and  cu'  are  events  with  Pr(tu)  > Pr(u/),  then  a sequence  of 
n independent  experiments  will  encounter  w more  often  than  w’,  with  high 
probability,  because  tu  will  occur  very  nearly  nPr(cu)  times.  Consequently, 
as  n — > oo,  the  probability  approaches  1 that  the  median  or  mode  of  the 
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values  of  X in  a sequence  of  independent  trials  will  be  a median  or  mode  of 
the  random  variable  X. 

8.53  We  can  disprove  the  statement,  even  in  the  special  case  that  each 

variable  is  0 or  l._Let  p0  = Pr(X  = Y = Z = 0),  pi  = Pr(X  = Y = Z = 0),  . . . , 

p7  = Pr(X  = Y = Z = 0),  where  X = 1 - X.  Then  Po  +Pi  + h p7  = 1,  and 

the  variables  are  independent  in  pairs  if  and  only  if  we  have 

(p4  +P5  +P6  §P7)(P2  +P3  +Pb  + Pz)  = P6+P7, 

(P4  + P5  + p 6 + P7 ) (Pi  + p 3 + P5  + P7J  = P5  + P7  1 

(P2+  p3  "f”  p 6 T P 7 ) ( P 1 + p 3 + P 5 + P 7 J = p 3 + p7. 

But  Pr(X  + Y — Z = 0)  ^ Pr(X  + Y = 0)Pr(Z  = 0)  «=»  p0  + (p0  +p,)(pc  | 

Vl  + P4  + Pe ) • One  solution  is 

PO  = P3  — P5  = P6  = 1/4 ; pi  = P2  = p4  = P7  = a 

This  is  equivalent  to  flipping  two  fair  coins  and  letting  X — (the  first  coin 
is  heads),  Y = (the  second  coin  is  heads),  Z = (the  coins  differ).  Another 
example,  with  all  probabilities  nonzero,  is 

po  = 4/64,  Pl  = p2  = p4  = 5/64, 

P3  - Ps  = Pb  = 10/64,  p7  = 15/64. 

For  this  reason  we  say  that  n variables  Xi , , Xa  are  independent  if 
Pr(Xi  =xi  and-  - • and  Xn  =xn)  = Pr(Xi  =%i ) . . . Pr(Xn  = x , ) ; 

pairwise  independence  isn’t  enough  to  guarantee  this. 

8.54  (See  exercise  27  for  notation.)  We  have 

E(Ii)  = np4  +n(n-1)q7; 

E:IzZj  = ivn.:  - 2n;n-r:u,u-  -n(n  l)rn  +n(n-  l)(n-2)q2Pn 
= nm  +4n(n-1)p3m  +3n(n-1)q7 

+ 6n(n-l)(n-2)q2Pi  + n(n-  1 )(n-2)(n-3)|a{  ; 
it  follows  that  V(VX)=  x4/n+  2k2/(ti  1). 

8.55  There  are  A □ = jj  ■ 52!  permutations  with  X = Y,  and  B = ||  . 52! 

permutations  with  X ^ Y.  After  the  stated  procedure,  each  permutation 
with  X = Y occurs  with  probability  1 ||p)A),  because  we  return 

to  step  Si  with  probability  ||p.  Similarly,  each  permutation  with  X ^ Y 
occurs  with  probability  jitl  - p)/((l  - |yp)B).  Choosing  p = 1 makes 
Pr  (X  = x and  Y = y)  = for  all  x and  y (We  could  therefore  make  two  flips 
of  a fair  coin  and  go  back  to  SI  if  both  come  up  heads.) 
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8.56  If  m is  even,  the  frisbees  always  stay  an  odd  distance  apart  and  the 
game  lasts  forever.  If  m = 21  t I , the  relevant  generating  functions  are 


— 4 ^ A 1 , 

A,  = jzA-i  + \zA2  , 

Ak  = ^zAk-i  + \zAy.  + |zAk+i  , for  1 < k < l, 
Al  = \zAy-  \ + |zAi  + 1 . 


(The  coefficient  [zn]  Ak  is  the  probability  that  the  distance  between  frisbees 
is  2k  after  n throws.)  Taking  a clue  from  the  similar  equations  in  exercise  49, 
we  set  z = 1 /cos2  0 and  Ai  :=  X sin20,  where  X is  to  be  determined.  It  follows 
by  induction  (not  using  the  equation  for  Ai)  that  Ak  = X sin2k0.  Therefore 
we  want  to  choose  X such  that 


3 


4 cos2  0 


) X sin  210 


1 + — — X sin(21  - 2)0  . 
4 cos2  0 


It  turns  out  that  X = 2 cos2  O/sin  0 cos(21  + 1 )0,  hence 


G 


m 


cos  e 

cos  rn.0 


The  denominator  vanishes  when  0 is  an  odd  multiple  of  7t/(2m);  thus  1 — qkZ  is 
a root  of  the  denominator  for  1 <;  k <C  l,  and  the  stated  product  representation 
must  hold.  To  find  the  mean  and  variance  we  can  write 

Gm  = (1  -l02  + j404- ■ ■ )/(l-jm202  + 2^m404  ■■■> 

= 1 + ±(m2  1 )02  + (5m4  — 6m2  + 1 )04+  • • ■ 

= 1 + /(m2  - l)(tan0)2  + A(5m4  - 14m2  + 9) (tan  0)4  + • • • 

= 1 + G^(1)(tan0)2  + iG"(1)(tan0)4  + •••  , 


Trigonometry  wins 
again.  Is  there  a 
connection  with 
pitching  pennies 
along  the  angles  of 
the  m-gon? 


because  tan2  0 = z — 1 and  tan0  = 0 + |03  + • • • • So  we  have  Mean(Gm)  = 
^(m2  — 1)  andVar(Gm)  = gm2(m2  — 1).  (Note  that  thisimplies  theidentities 


m2  — 1 


m2  ( m2  — 1 ) 


(m  11/2 


( m- 1 1/2 


= I ('/si“ 


k=1 

(m  !)/■! 

L (' 

k=l 


k=l 


(2k  - 1 )7t\2 
2m  ) 


cot 


(2k  - 1 )7i  / . (2k  - 1 )7t\2 


2 m 


2m 


)■ 


The  third  cumulant  of  this  distribution  is  ygm2(m2  - l)(4m2  - 1);  but  the 
pattern  of  nice  cumulant  factorizations  stops  there.  There’s  a much  simpler 
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way  to  derive  the  mean:  We  have  Gm  f-  A]  + • + A[  = z(A]  -f  ■ . . -f  Ai)  + 1 , 
hence  when  z = 1 we  have  = A]  + • + A;.  Since  Gm  = 1 when  z = 1,  an 
easy  induction  shows  that  A|<  = 4k.) 

8.57  We  have  A:A  )>  2*  1 and  B:B  < 2l  ' + 2l  3 and  B:A  2’  2,  hence 
B:B  — B:A  ->  A:A  A:B  is  possible  only  if  A:B  > 2l~3.  This  means  that 
72  = T3,  Ti  = t4,  T7  = T5,  . , T|  3 = Tt.  But  then  A:A  % 2l  1 + 2l  4 + • • , 
A:B  % 2l  3 + 2l  6 + • , B:A  % 2l  2 + 2l  5 + • ■ ■ , and  B:B  ~ 2l  1 + 2l  4 + • - - ; 
hence  B:B  — B:A  is  less  than  A: A — A:B  after  all.  (Sharper  results  have 
been  obtained  by  Guibas  and  Odlyzko  [138],  who  show  that  Bill’s  chances  are 
always  maximized  with  one  of  the  two  patterns  Hti  • • • T)  j or  Tii  T|  , .) 

8.58  According  to  (8.82),  we  want  B:B  — B:A  > A:A  — A:B.  One  solution  is 
A = TTHH,  B = HHH. 

8.59  (a)  Two  cases  arise  depending  on  whether  A Hn  or  = h,: 


G(w,z)  = 


m - 2 + w -\  z / m — 1 + z\n~ k 1 


m 


m 


+ m( 


m — 1+wz\k  1 /m—  1 + z\n  k_1 


m 


■ 


wz 


z . 


(b)  We  can  either  argue  algebraically,  taking  partial  derivatives  of  G (w,  z) 
with  respect  to  w and  z and  setting  w = z = 1;  or  we  can  argue  corn- 
binatorially:  Whatever  the  values  of  hi , ...  , hn  1,  the  expected  value  of 
P(hi  , . , hn  1,  hn;  n)  is  the  same  (averaged  over  h,),  because  the  hash  se- 
quence (Hi , . . . , hn  1 ) determines  a sequence  of  list  sizes  (m  , TT2 , ■ • > n0  such 
that  the  stated  expected  value  is  ((ni+1)  + (n.2  + l)+  >■■+  (nm  + l))/m  = 
(n  — 1 + ra)/m.  Therefore  the  random  variable  EP(  Hi  > • • • , Hn;  n)  is  indepen- 
dent of  (Hi  , . , Hn  1 ),  hence  independent  of  P(  Hi , . . , Hn;  k). 

8.60  If  1 <;  k < l <C  n,  the  previous  exercise  shows  that  the  coefficient  of 
Si<Si  in  the  variance  of  the  average  is  zero.  Therefore  we  need  only  consider 
the  coefficient  of  sj2,  which  is 


v-  Pi  hi,...  ,Hn;k)2 

Z-  mn 

1 ^Hi Hn 

the  variance  of  ((m  1 + z)/m)  Z]  and  this  is  (k  — l)(m  1 ) /rn2  as  in 

exercise  30. 


8.61  The  pgf  D,(z)  satisfies  the  recurrence 


D0(z)  = z; 

Dn(z)  = z2Dn  1 (z)  + 2(1  -™  z3)D(l  (z)/(n  + 1 ) , 


for  n > 0. 
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We  can  now  derive  the  recurrence 

D"(1)=  (u-11)D"_1(1)/(n+  1)  + (8n-2)/7, 

which  has  the  solution  (n  + 2)  (26rt+  15)  for  all  n ^ 11  (regardless  of  initial 
conditions).  Hence  the  variance  conies  to  — (n  -f  2)(212n  + 123)  for  n 1>  11, 

8.62  (Another  question  asks  if  a given  sequence  of  purported  cumulants 
comes  from  any  distribution  whatever;  for  example,  K2  must  be  nonnegative, 
and  K4  + 3 <2  = E((X  — p)4)  must  be  at  least  (E((X  — p)2))2  = k^,  etc. 
A necessary  and  sufficient  condition  for  this  other  problem  was  found  by 
Hamburger  [6],  [144].) 

8.63  (Another  question  asks  if  there  is  a simple  rule  to  tell  whether  H or  T 
is  preferable.)  Conway  conjectures  that  no  such  ties  exist,  and  moreover  that 
there  is  only  one  cycle  in  the  directed  graph  on  2l  vertices  that  has  an  arc 
from  each  sequence  to  its  “best  beater!’ 

9.1  True  if  the  functions  are  all  positive.  But  otherwise  we  might  have, 
say,  f-|  (n)  = n3  + n2,  f2(n)  = -n3,  gi  (n)  = n4  + n,  g2 (rt)  = -n4. 

9.2  (a)  We  have  rtlnn  5cn5  (Inn)“,  since  (Inn)2  -<  nine  -<  nlnlnn. 
(b)  ixlnlnlnn  -<  (Inn)!  -<  7^111  In n. _ (c)  Take  logarithms  to  show  that  (n!)!  wins, 
(d)  F2Hn-j  x 4)21nn  = n21n^:  Hpn  ~ nine))  wins  because  4)2  = 4>  + 1 < e. 

9.3  Replacing  kn  by  0 (n)  requires  a different  C for  each  k;  but  each  0 
stands  for  a single  C.  In  fact,  the  context  of  this  0 requires  it  to  stand  for 
a set  of  functions  of  two  variables  k and  n.  It  would  be  correct  to  write 
Lk=i  kn  = Lk=i  °(n2l  = 0(n3). 

9.4  For  example,  limn_00  0(1 /n)  = 0.  On  the  left,  0(1 /n)  is  the  set  of  all 
functions  f(n)  such  that  there  are  constants  C and  no  with  |f(n)|  ^ C/n  for 
all  n ^ n.Q.  The  limit  of  all  functions  in  that  set  is  0,  so  the  left-hand  side  is 
the  singleton  set  { O } . On  the  right,  there  are  no  variables;  0 represents  {0},  the 
(singleton)  set  of  all  “iinctions  of  no  variables,  whose  value  is  zero!’  (Can  you 
see  the  inherent  logic  here?  If  not,  come  back  to  it  next  year;  you  probably 
can  still  manipulate  O-notation  even  if  you  can’t  shape  your  intuitions  into 
rigorous  formalisms.) 

9.5  Let  f(n)  = n2  and  g(n)  = 1;  then  n is  in  the  left  set  but  not  in  the 
right,  so  the  statement  is  false. 

9.6  nlnn  + yn  + 0(v/ti1iiti)- 

9.7  (1  - e“,/n)“1  = nB0  — Bi  + B2n_1/2!  -I = n + | + 0(n  1 ). 

9.8  For  example,  let  f(n)  = [n/2J!2  + rt,  g(n)  = ([u/2]  — 1)!  fn/2~] ! f rt. 
These  functions,  incidentally,  satisfy  f(n)  = 0(ng(ri))  and  g(n)  = 0(rif(n)); 
more  extreme  examples  are  clearly  possible. 
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9.9  (For  completeness,  we  assume  that  there  is  a side  condition  n — > oo, 
so  that  two  constants  are  implied  by  each  0.)  Every  function  on  the  left  has 
the  form  a(n)  -f  b(n),  where  there  exist  constants  m<),  B,  no,  C such  that 
|a(n)|  <C  B |f (n) | for  n />  me  and  |b(n)|  <C  C|g(n)|  for  n />  no.  Therefore  the 
left-hand  function  is  at  most  max(B,  C)(|f(n)|  + |g(n)|),  forn  ^ max(mo,no), 
so  it  is  a member  of  the  right  side. 

9.10  If  g(x)  belongs  to  the  left,  so  that  g(x)  = cosy  for  some  y,  where 
|y!  ^ C|x  for  some  C,  then  0^1  — g(x)  = 2sin2(y/2)  ^ by2  </  jC2x2;  hence 
the  set  on  the  left  is  contained  in  the  set  on  the  right,  and  the  formula  is  true. 

9.11  The  proposition  is  true.  For  if,  say,  |x|  Sj  |y|,  we  have  (x  + y)2  </  4y2. 
Thus  (x  + y)2  = 0(x2)  + 0 (y 2 ) . Thus  0(x  + y)2=  0((x  + y)2)=  0(0(x2)  + 
0(y2))=  0(0(x2))  + 0(0(y2))=  0(x2)+  0(y2). 

9.12  1 +2/n  + 0(rr2)  = (1  + 2/n)(l+  0(n-2)/(l  +2/n))  by  (9.26),  and 
1/(1  +2/n)  = 0(1);  now  use  (9.26). 

9.13  nn(l  + 2n_1  + 0(n~2))n  = nnexp(n(2n_1  + 0(nr2)))  = e2nn  T 
0(nn  1 . 

9.14  It  is  nn+|3  exp((n  + |3)(a/n  — b<x2/rt2  + 0(nr3))) 

9.15  In  (n,3„nn)=  3nln3-  lnn+  ^ In  3 - In  27i  + (jg  - ^rr1  + 0(n“3),  so 
the  answer  is 

73U  + 1/2 

-2^-0  " ln_1  + • 


9.16  If  l is  any  integer  in  the  range  a ^ 1 < b we  have 


B(x)f(l  + x)  dx 


1/2 


B(x)f(l  + x)  dx  - 


1/2 


i(l  — x)f(l  + x)  dx 


1/2 


B(x)  (f(l  + x)  — f(l  + 1 — x))  dx. 


Since  1 + x ^l+l  — x when  x ^ j , this  integral  is  positive  when  f ( x ) is 
nondecreasing. 

9-17  Lm^oB m(2)2m/m!=  zez/2/(ez  - 1 ) = z/(ez22  - 1 ) - z/(ez  - 1 ) 

9.18  The  text’s  derivation  for  the  case  a = 1 generalizes  to  give 


bk(n)  = 


2(2n+l/2)<x 
(2_7m)a/2  e 


-k2a/n 

) 


ck(n)  = 22n“  n_(1+a)/2+3ee_ki“/n ; 


the  answer  is  22na(7m)(1  al/2cx  1/2(l+0(n  ,/,2+3e)). 
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9.19  H10  = 2,9  28  9 682  5 4 % 2, 9 2 8968  2 56;  101  = 3 6 288  00  % 3 6 287  1 2,4;  Bl0  = 

0. 075757576  « 0.075757494;  7t(  10)  = 4 « 10,001  7 845;  e0'1  = 1,10  5 1 709  2 « 
1.10517083;  In  1.1  = 0,09  5 3 1 0 2 « 0,095  3 083;  1,1111111  « 1,1  1 11  000-  1 .1  0-1  - 

1.  00957658  ss  1.00957643.  (The  approximation  to  n(n)  gives  more  significant 
figures  when  n is  larger;  for  example,  7t(  1 09  ) = 50847534  ~ 50840742.) 

9.20  (a)  Yes;  the  left  side  is  o(n)  while  the  right  side  is  equivalent  to  O(n). 
(b)  Yes;  the  left  side  is  (:•  eOM/rt).  No;  the  left  side  is  about  ,/n  times  the 
bound  on  the  right. 

9.21  WehaveP,  = m = n(lnm-l  — 1 /In  m + 0(1 /log  n)2) , where 


1 n m 
lnlnm  = In  Inn 


lnn  + lnlnm-  1/lnn  + lnlnn/(lnn)2  +0(1  /log  n ) 
1 n 1 n n 


(lnlnn)2^  lnlnn 


Inn  2(lnn)2  (Inn)2 


0(1/logn) 


It  follows  that 


Pn  =n(lnn  + lnlnn-l 


lnlnn-2  _ i(lnlnn 


31nlnn 


Inn 


(Inn) 


■ + 0(1 /log  n) 


(A  slightly  better  approximation  replaces  this  0(  1/logn)2  by  the  quantity 
— 5/(lnn)2  + 0 (log log n/logn)3; then  we  estimate  Pioooooo  ~ 15483612.4.) 

9.22  Replace  0(n~2k)  by  --yjn~2k  + 0(n  4k)  in  the  expansion  of  H+  this 
replaces  0(l3(n2))by  — yjl3(n2)+  0(l3(n4))in  (9.53).  We  have 

I3(n)  = |n"  ' + |+  2 + 0(n~3), 

hence  the  term  0(n~2)  in  (g.54)  can  be  replaced  by  — y^n  -2  + 0(n  3). 

9.23  nh.a  = Ioa<nHk/(tt-lc)  +2cHa/(n+  1 )(n  + 2).  Choose  c = e71^  = 
^k>o  9k  so  that  Zlk>o  ^k  :=  ® and  hn  = 0(log  n)/n3.  The  expansion  of 
^0$k<n  ^k/lTl  — k)  as  in  (9.60)  now  yields  nh,  = 2cHn/(n+  1)(n  + 2)  -f 
0(n~2),  hence 


9n 


en2/6 


^n  + 21nn  + 0(1)j 


9.24  (a)  If  Hkjol^k)  < co  and  if  f(n  ” when  0 ^ n/2, 

we  have 


n 

Y Qkbn-k 

k=0 


n/2 


Y_  0(f(k))0(f(n))  + 0(f(n))0(f(n 

k=0  k=n/2 


k))  . 
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which  is20(f(n)  ^k^o|f(k)|)i  so  ^is  case  is  proved,  (b)  But  in  this  case  if 
an  --bn  = «~n,  the  convolution  (n  + 1 )oXn  is  not  0(  (X  n). 

9.25  Sn/ffl  = X.k=o  'n-/(2n.+  1 )k.  we  may  restrict  the  range  of  summation 
to  0 <2  k $2  (logn)2,  say.  In  this  range  n-  = nk(l  G)/71-"1"  0 ( k4/ n2 ) ) and 
(2n  + l)k  - (2n)k(l  + (k+1)/2n  + 0(k4/n2)),  so  the  summand  is 


1 

¥ 


3k2  - k 
4n 


Hence  the  sum  over  k is  2 -4/n  + 0(1  /n2).  Stirling’s  approximation  can  now 
be  applied  to  (3^)  = ( 3n ) ! / ( 2n ) ! n!,  proving  (9.2). 

9.26  The  minimum  occurs  at  a term  B2m/(2ra)  (2m—  1 )n2m_1  where  2m  « 
27m  + j , and  this  term  is  approximately  equal  to  1 /[nQlmx^/n  ).  The  absolute 
error  in  lntl!  is  therefore  too  large  to  determine  n!  exactly  by  rounding  to  an 
integer,  when  n is  greater  than  about  e27t+1. 

9.27  We  may  assume  that  a ^ — 1.  Let  f(x)  = x“;  the  answer  is 


Tt 


Xk“  = C«  + 

k=l 


(The  constant  Ca  turns  out  to  be  £(— <x),  which  is  in  fact  defined  by  this 
formula  when  a > -1.) 

9.28  Take  f(x)  = xlnx  in  Euler’s  summation  formula  to  get 
A • nn2/2+n/2+1/12e-n2/4Q  + 0(n^2))  > 


where  A X 1.282427  is  “Glaisher’s  constant!’ 

9.29  Let  f(x)  = x_1  lnx.  Then  fl2m'(x)  > 0 for  all  large  x,  and  we  can  write 


L 

k=l 


Ink  (Inn)2  Inn  1 - Inn 

X = — + lnS  + X + 9"“i2X 


0 < 0n  < 1, 


where  S « 0.929772  is  constant.  Taking  exponentials  gives 


SV/+7(|  + 1X 


(In  general  if  f(x)  = x“  lnx,  Euler’s  summation  formula  applies  as  in  exer- 
cise 27,  and  the  resulting  constant  is  -<‘(-a)  if  a A - 1.  Thus,  the  theory  of 
the  zeta  function  gives  a closed  form  for  Glaisher’s  constant  in  the  previous 
exercise.  We  have  lnS=  Yi  in  the  notation  of  answer  9.57.) 
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9.30  Let  g(x)  = xle  x and.  f(x)  = gfx/y'u).  Then  n l/2  kle  k /n  i 


f ( x ) 


k=1 


Bm(M) 


m! 


f[m,(x)  dx 


= n,/2 


g(x)  dx  ^k.n(k  i )/2g(k  1 1 (0)  + Q(n~m/2) 


k=l 


Since  g(x)  = xl  — x2+l/1 ! T x4  kl/2!  — X6+l/3!  + • . , the  derivatives  g *m!  (x)  obey 
a simple  pattern,  and  the  answer  is 


_n(l+1  )/2 


2 


Bl+1 

V 2 )~  (l  + 1 )!  0!  + 


Bi+in 


Bt+str 


(1  + 3)!  1!  (1  + 5)!  2! 


+ 0(rT 


9.31  The  somewhat  surprising  identity  1/(cm~k  T cm)  + 1/(cm+k  + cm)  = 
1 /cm  makes  the  terms  for  0 <C  k <2  2m  sum  to  (m  -f  j)/cm.  The  remaining 
terms  are 


L 

k>1 


1 


^2m+k  _|_  qh 


= L{; 

kSl  x 


1 


1 


,2m+k 


-3m+2k 


+ 


1 


•4m+3k 


c2m+l  _ cl 

and  this  series  can  be  truncated  at  any  desired  point,  with  an  error  not  ex- 
ceeding the  first  omitted  term. 

9.3  2 H^2*  = 7t2/6  — 1/n  + 0(n-2)  by  Euler’s  summation  formula,  since  we 
know  the  constant;  and  Hn  is  given  by  (9.89).  So  the  answer  is 

neT+7t2/6(L  - in  VO(n-2)). 

9.33  We  have  n-/Tik=1-k(k-1)n“,+jk2(k-1}2n.'2+0(k6n“J);di  vi  di  ng 

by  k!  and  summing  over  k 0 yields  e — enT1  + \ etl  2 + 0 ( ri^  ) . 

9.34  A = eT;B  = 0;C=— ieT;D=leY(1  -y);  E=lex;F  = +eY(3y+ 1 ). 

9.35  Since  l/k(lnk+  0(1])=  1/klnk-f-  0(l/k(logk)2),  the  given  sum 
is  Hk=2 1 /kink + 0(1).  The  remaining  sum  is  In  Inn +0(  1 ) by  Euler's 
summation  formula. 

9.36  This  works  out  beautifully  with  Euler’s  summation  formula: 

1 1 


~2m 


^3m+2  £3tti 


+ ■ 


L 


0^k<n 

‘n  dx 

Z"7 
0 


n2  + k2  n2  4-  x2 


1 


n2+x2  2 u2+xi0  2!(n2  + x2)2 


+ 


B2 


■ 2x 


+ 0(n" 


The  world's  top 
three  constants, 

(e,  7i,  y) , all  appear 
in  this  answer. 
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Hence  Sn  = \n n 1 - |n  2 - ^n3  + 0(n  5), 

9.37  This  is 

Y_  (n  — qk)  [n/(q  + 1 ) < k^n/q] 

k.q^l 


The  remaining  sum  is  like  (9.55)  but  without  the  factor  u(q).  The  same 
method  works  here  as  it  did  there,  but  we  get  C(2)  in  place  of  1/C(2),  so  the 
answer  comes  to  (1  “ fj)71-2  O(nlogn). 

9.38  Replace  k by  n — k and  let  ok(n)  = (n  — k)n~k(£).  Then  In  ok(n)  = 
nlnn  — Ink!  — k + O ( kn  1 ) , and  we  can  use  tail-exchange  with  bk(n)  = 
nne~k/k!,  ck(n)  = kbk(n)/rt,  Dn  = {k  k ^ Inn},  to  get  £j}=o  ok(n)  = 
nne,/e(1  + 0(n_1 )). 

9.39  Tail-exchange  with  bk(n)  = (Inn  — k/n  — }k2/n2)(lnn)k/k!,  ck(n)  = 
n~3(lnn)k+3/k!,  D„  = {k  0 <C  k lOlnn}.  When  k % lOlnn  we  have 
k!  X \/k(10/e)k(ln n)k,  SO  the  kth  term  is  O(n'101n,10/e)  logn).  The  answer 
is  nlnn -Inn-  2(lnn)(1  + lnn)/n  + 0(n“2(log  n)3) . 

9.40  Combining  terms  two  by  two,  we  find  that  — ( M21C  — )m  = ^ 

plus  terms  whose  sum  over  all  k ^ 1 is  0 (1).  Suppose  n is  even.  Euler’s 
summation  formula  implies  that 

y H:V  y |ln2encT  'KXl/kl  | = NV  + 0(|] 

h k h k m 

hence  the  sum  is  j H™  + 0 (1).  In  general  the  answer  is  j (-  —1  )nH™  + 0(1). 

9.41  Let  a = $/4)  = — ^ 1 ■ We  have 

n n 

^lnFk  = ^(lncbk  - In  V5  + ln(l  ak)) 

k=1  k=l 

= a*n^+  11  In  4)  - ^ In 5 + Y ln(1  - <xk)  - ln(1  - qk) . 

k^l  k>n 

The  latter  sum  is  £k>nO(ock)  = 0(ctn).  Hence  the  answer  is 

(jj"  M ' 1 25  -(^  0(4>a(rl  3^25~n,/2)  ■»  where 

C = (1  - a)(1  - cx2)(1  - a3) ...  « 1.226742. 
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9.42  The  hint  follows  since  (kn,)/(k)  = ^ Let 

m = |_anj  = CUT  — e.  Then 


So  Lksjan  (k)  “ (^) 0(  1 ),andi  tremains  to  estimate  (*).  By  Stirling’s  ap- 
proximation we  have  In  (™)  = — | Inn  — (an  — e) ln(a— e/n)  — ((1  — a)n+e)  x 
ln(1  — a + e/n)  + 0(1)  = — ] Inn  — an  In  a— (1  — a)n  ln(l  — a)  + 0(1). 

9.43  The  denominator  has  factors  of  the  form  z — tu,  where  cu  is  a complex 
root  of  unity.  Only  the  factor  z = 1 occurs  with  multiplicity  5.  Therefore 
by  (7-31).  only  one  of  the  roots  has  a coefficient  Q(n4),  and  the  coefficient  is 

c = 5/(5!  • 1 - 5- 10-25-50)  = 1/1500000. 

9.44  Stirling’s  approximation  says  that  ln(x~ax!/(x— a)!)  has  an  asymptotic 
series 


- a - (x  + \ - a)  ln(l  - a/x)  - ^—-(x  1 - (x  - a)  ') 

-+L(xJ- (*-■*)  ’)-• 

in  which  each  coefficient  of  x~  k is  a polynomial  in  a.  Hence  x “x!/(x  — a)!  = 
co(a)  + Ci  (a)x  1 + • • • + cn(a)x~n  + 0(x~n  1 ) as  noo,  where  cn(a)  is  a 
polynomial  in  a.  We  know  that  cn  ( a)  = [a“  ] ( - 1 ) " whenever  a is  an  integer, 
and  [R“n]  is  a polynomial  in  ct  of  degree  2n;  hence  cn  ( a)  = [ Ran]  (-1)”  for 
all  real  a.  In  other  words,  the  asymptotic  formulas 


x-  = 


L 

k-0 

n 

L 

k=0 


a 

a — k 
a 

a — k 


k_a-k  i A[v«-n  1 > 


(-l|ltx“-k  + 0(x 


a k i o/ n.  - 1 ' 


x“  + O ( x 


generalize  equations  (6.13)  and  (6. 1 1 ) , which  hold  in  the  all-integer  case. 

9.45  Let  the  partial  quotients  of  lx  be  (ai , cu, . ■ ■ ),  and  let  am  be  the  con- 
tinued fraction  1/(am  + a.., . • f for  m 1>  1.  Then  D(a,n)  = D(ai,n)  < 
D(a2,  LainJ)  + m + 3 < D(a3,  [a2LainJJ)  + cu  + a2  + 6 < • ■ • < D(am+i , 
|_am|_...  |a,nj  . . ,JJ)  + a,  + • • • + am  + 3m  < ai  . . . am  n + ai  -I — + am  + 3m, 
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for  all  m.  Divide  by  n and  let  n — > oo;  the  limit  is  less  than  oci  . . . (Xm  for 
ail  m.  Finally  we  have 


l • . w m — ...  . 

It  Q-rn  ) • m+1 

9.46  For  convenience  we  write  just  rtl  instead  of  m(n).  By  Stirling’s  ap- 
proximation, the  maximum  value  of  kn/k!  occurs  when  k « m « n/lnrt,  so 
we  replace  k by  m + k and  find  that 


, (m  + k)n 
(m+  k)! 


In  2nvn 

n In  m — m In  m + m — 


(m  + n)k2 

2m2 


O ( k3  m 


2 log  n) 


Actually  we  want  to  replace  k by  [mjbf-  k;  this  adds  a further  0 (km  1 log  n). 
The  tail-exchange  method  with  |ki  ^ m,,,2+f;  now  allows  us  to  sum  on  k, 
A truly  Be/l-shaped  giving  a fairly  sharp  asymptotic  estimate 

summand. 


„m  1 n m 

bn  = ■ " (®2m-’/(m  + n|  + 0(1)) 

v27vm 


The  requested  formula  follows,  with  relative  error  0 (log  log  n/log  n). 

9.47  Let  logm  n = l + 0,  where  0 ^ 0 < 1 . The  floor  sum  is  l(n  + 1)  + 1 — 
(ml+1  ™ 1 )/(m  — 1 );  the  ceiling  sum  is  (l  -f  1 )n  (ml+1  1 )/( m — 1);  the 

exact  sum  is  (1+  0)n  n/ln  m + 0(log  n).  Ignoring  terms  that  are  o(n),  the 
difference  between  ceiling  and  exact  is  ( 1 f ( 0 ) ) n,  and  the  difference  between 

exact  and  floor  is  f(0)n,  where 


f(9) 


m 


0 


m-  1 


+ 0- 


1 

lnnro 


This  function  has  maximum  value  f (0)  = f (lj=m/(m  1)  1 /lnm.andits 

minimum  value  is  lnlnm/lnm+  1 — (ln(m  l))/ln  m.  The  ceiling  value  is 
closer  when  n is  nearly  a power  of  rn.,  but  the  floor  value  is  closer  when  0 lies 
somewhere  between  0 and  1. 


9.48  Let  d|<  = Qk  + bk,  where  ak  counts  digits  to  the  left  of  the  decimal 
point.  Then  Qk  = 1 + [log  HkJ  = log  log  k j-  0(  1 ),  where  Tog’  denotes  log10. 
To  estimate  bk,  let  us  look  at  the  number  of  decimal  places  necessary  to 
distinguish  y from  nearby  numbers  y — e and  y + E’:  Let  6=10  b be  the 
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length  of  the  interval  of  numbers  that  round  to  y.  We  have  |y  — y ^ also 
y-e  < y-16  andy  + e' > y + jS.  Therefore  e+e'>6.  And  if  6 < min(e,  e'), 
the  rounding  does  distinguish  y from  both  y — e and  y + e'.  Hence  1Crbk  < 
1/(k— 1 ) + 1/k  and  101_bk  ^ 1/k;  we  have  bk  = log  k+0(l).  Finally,  therefore, 
^k=i  di<  = 2Ik=i  (logk+loglogk+0(l)),  which  is  nlogn+nloglogn+0(n) 
by  Euler’s  summation  formula. 

9.49  We  have  Hn  > lnn  + y+  in-1  — = f(n),  where  f(x)  is  increasing 

for  all  x > 0;  hence  if  n ea  y we  have  Hn  ^ f(e“~Y)  > oc.  Also  Hn_i  < 
inn  + y — |rT  1 = g(nT  where  g(x)  is  increasing  for  all  x > 0;  hence  if 
ri  ^ e“~Y  we  have  Hn_i  ^ g(e“~Y)  < a.  Therefore  Hn_i  ^ a ^ Hn  implies 
that  e“~Y  + 1 > n > ea+y  — 1.  (Sharper  results  have  been  obtained  by  Boas 
and  Wrench  [27].) 

9.50  (a)  The  expected  return  is  ^]<k<N  lc/( k2 M{^ ' ) = and  we 

want  the  asymptotic  value  to  0(N  1 ): 


In  N F y + 0(N 
rc2/6  - N 1 + 0(N  2) 


filn  10  6y  3 6 1 m ID 
-n  + -4  + 


TL 


71 


7I4  10n 


+ 0(10- 


The  coefficient  (6  in  1 0)/ 7T2  ~ 1.3998  says  that  we  expect  about  40%  profit. 


(b)  The  probability  o:f  profit  is  ^n<k<N  1/(k2H^2))  = 1 


and  since  H 


(2)  . 


1 + K 


: + 0(rt  3 ] this  is 


h<2,/h<2 


N > 


n 1 - 4n  2 + 0(n  ; 

7^76+  olbF1; 


6 -i  3 2 

— n1-  -jn 

7I2  7T2 


0(n 


actually  decreasing  with  n.  (The  expected  value  in  (a)  is  high  because  it 
includes  payoffs  so  huge  that  the  entire  world’s  economy  would  be  affected  if 
they  ever  had  to  be  made.) 

9.51  Strictly  speaking,  this  is  false,  since  the  function  represented  by  0(x~2) 

might  not  be  integrable.  (It  might  be  ‘[x  £ S]/x2’,  where  S is  not  a measurable 
set.)  But  if  we  stipulate  that  f(x)  is  an  integrable  function  such  that  f(x)  = 
0(x~2)  as  x } oo,  then  |J^°f(x)  dx|  <C  J7°|f(x)|dx  ^ Cx^2  dx  = CrT1. 

9.52  In  fact,  the  stack  of  n’s  can  be  replaced  by  any  function  f(n)  that 
approaches  infinity,  however  fast.  Define  the  sequence  (mo,  1TI| , rtv>,  • • ■ ) by 
setting  mo  = 0 and  letting  rrtk  be  the  least  integer  > m^-i  such  that 


/k  + 1 \mk  , 

hr)  ?f(k+,T 

Now  let  A(z)  = ^k>1  (z/k)mk.  This  power  series  converges  for  all  z,  because 
the  terms  for  k > |zj  are  bounded  by  a geometric  series.  Also  A(n  + 1)  ^ 
((n  + 1 )/n)mn  ^ f(rt  + 1 )2,  hence  lining  f(n)/A(n)  = 0. 


(As  opposed  to  an 
execrable  function.) 
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9.53  By  induction,  the  0 term  is  (m  — I)!”1  tm~1flm'(x  — t)  dt.  Since 

f(m+l)  jlas  0pp0Site  sign  to  the  absolute  value  of  this  integral  is 

bounded  by  | f ^ m * ( 0)  Jq  tm~1  dt;  so  the  error  is  bounded  by  the  absolute  value 
of  the  first  discarded  term. 

9.54  Let  g(x)  = f(x)/x“.  Then  g’(x)  ~ — ag(x)/x  as  x — t 00.  By  the  mean 

Sounds  like  a nasty  value  theorem,  g(x  - 1)  — g(x  + |)  = -g’(y)  ~ ag(y)/y  for  some  y between 
theorem.  x 1 and  x + i.  Now  g(y)  = g(x)(l  +0(l/x)),  so  g(x  — j)  — g(x  + j)  ~ 

ag(x)/x  = af(x)/xHa.  Therefore 

Lrmr  = J)  - 3(k+  i)))=  0(g(n~l)). 

k^n  k^n 

9.55  The  estimate  of  (n  + k + j)  ln(l  + k/n)  + (n  — k T j)  ln(l  — k/n)  is 
extended  to  k2/n  + k4/6n3  + 0(n-3,/2+5e),  so  we  apparently  want  to  have  an 
extra  factor  g-k4/6n3  in  bk(ri),  and  Ck(rt)  = 22Tln-2+5ee-k2/,n.  But  it  turns 
out  to  be  better  to  leave  bk(rt)  untouched  and  to  let 

ck(n)  = 22nn_2+5ee_k2/n  + 22un-5+5ekVk2/n , 

thereby  replacing  e^4/^3  by  1 + 0(  k4/rt3)  • The  sum  ^kk4e-l<2//n  is  0 ( TV*/2 ) , 
as  shown  in  exercise  30. 

9.56  If  k <C  n1/2+e  we  have  ln(n-/rtk)  = — jk2/n  + ^k/n  — |k3/n2  + 
0 (nr1+4e)  by  Stirling’s  approximation,  hence 

n-/n.k  = e -k2/2n(i  + k/2n  - |k3/(2n)2  + 0(n-1+4e))  . 

Summing  with  the  identity  in  exercise  30,  and  remembering  to  omit  the  term 
for  k = 0,  gives  -1  + 02n  + 0^'  - §0^  + 0(rr1/2+4e)  = ^/7tn/2  - \ + 
0(n~1/2+4e). 

9.57  Using  the  hint,  the  given  sum  becomes  ue~u£(  1 + u/lnn)  du.  The 
zeta  function  can  be  defined  by  the  series 

C(1  +z)-  * 1 +^(-l)mymz7m!  , 

m^O 


where  y0  = y and  ym  is  the  Stieltjes  constant 


lim 

n— »oo 


(In  k)m 
k 


(lnn)m+1  \ 
m+1  ) 


Hence  the  given  sum  is 

Inn  + y - 2yi  (Inn)-1  + 3y2(lnn)-2  — • 
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9.58  Let  0 ^ 0 <C  1 andf(z)  = e2niz0/(  e2niz  1).  We  have 


lfW| 


p-2nye 

j — <:  1 

1 + e~2ny 


|f(z)|  ^ 


e~2nyd 

\e-lny  - 1| 


1 

1 - e-2ne  ’ 


when  xmod  1 = 1; 
when  |y  ^ e. 


Therefore  |f(z)|  is  bounded  on  the  contour,  and  the  integral  is  OfM1-111). 
The  residue  of  27Tif(z)/zm  at  z = k ^ 0 is  g2?tik0^m.  residue  at  z = 0 is 
the  coefficient  of  z_1  in 


0,2  7iiz0 


Bo  B 


27tiz 

~JT 


+ ■)  = Jsr(B»(0)  + Bi(0)^i+-). 


namely  (27ti)mBTn(0)/rri!.  Therefore  the  sum  of  residues  inside  the  contour  is 


(27ri)m 

m! 


M 

Bm(9)  + lj_ 

k==1 


-Ttim/2  COS  (27tk0  - 7tm/2) 

km 


This  equals  the  contour  integral  OfM1  m),  so  it  approaches  zero  as  M — t oo. 

9.59  If  F(x)  is  sufficiently  well  behaved,  we  have  the  general  identity 

Y_  F(k  + t)  = Y_  G(27m)e27,int  , 

k n 


where  G(y)  = J+°°  e lyxF(x)  dx.  (This  is  “Poisson’s  summation  formula: 
which  can  be  found  in  standard  texts  such  as  Henrici  (151,  Theorem  10.6e] . ) 

9.60  The  stated  formula  is  equivalent  to 


n,/2 


. 1/2  I 


by  exercise  5.22.  Hence  the  result  follows  from  exercises  6.64  and  9.44. 

9.61  The  idea  is  to  make  oc  “almost”  rational.  Let  ok  = 22  be  the  kth 
partial  quotient  of  a,  and  let  n = ^am+i  qm,  where  qm  = K(ai,.  . . , a,)  and 
m is  even.  Then  0 < {qma}  < 1/Q(cti , am+i ) < 1/(2n),  and  if  we  take 
v = am+i  /(4n)  we  get  a discrepancy  |am+i  . If  this  were  less  than  n1_e  we 
would  have 


■*•711+1  = 


0 ( q 


i-£) 
m > 


but  in  fact  am+i  > q2 
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‘The  paradox 
is  now  fully  es- 
tablished that 
the  utmost 
abstractions  are  the 
true  weapons  with 
which  to  control 
our  thought  of 
concrete  fact.” 

-A.  N.  White- 
head  [304] 


9.62  See  Canfield  [43];  see  also  David  and  Barton  [60,  Chapter  16]  for  asymp- 
totics  of  Stirling  numbers  of  both  kinds. 

9.63  Let  c = (J)2^.  The  estimate  cn^-1  +o(n<*)^1  ] was  proved  by  Fine  [120]. 

Ilan  Vardi  observes  that  the  sharper  estimate  stated  can  be  deduced  from 
the  fact  that  the  error  term  e(n)  = f(n)  — cn.'*’-1  satisfies  the  approximate 
recurrence  c(t)n2'~^e(  n)  at  e(k)[1  ^k<  cn^-1].  The  function 

n<|)~1u(ln  lnn/lnt])) 

Inn 

satisfies  this  recurrence  asymptotically,  if  u(x  + 1)  = -u(x).  (Vardi  conjec- 
tures that 

f(„>  = n*-'(c  + u(^)(lrm)-'+o((logn)-2)) 

for  some  such  function  u.)  Calculations  for  small  n show  that  f(n)  equals  the 
nearest  integer  to  for  1 ^ n Sj  400  except  in  one  case:  f(273)  = 39  > 

C-2734’-1  a 38.4997..  But  the  small  errors  are  eventually  magnified,  because 
of  results  like  those  in  exercise  2.36.  For  example,  e(201636503)  ~ 35.73; 

e( 919986484788)  « - - 1959. 07. 

9.64  (From  this  identity  for  B2(x)  we  can  easily  derive  the  identity  of  exer- 
cise 58  by  induction  on  m.)  If  0 < x < 1 , the  integral  J1^2  sin  N7Tt  dt/sin  7rt 
can  be  expressed  as  a sum  of  N integrals  that  are  each  0 (N  ~2),  so  it  is  0 (N  ); 
the  constant  implied  by  this  0 may  depend  on  x.  Integrating  the  identity 

cos2n7it  = lH(e27tit(e2N7tU-l)/(e27lU-1))  = -\+\  sin(2N+1)7rt/sin7rt 
and  letting  N — ) OO  now  gives  £a>1  (sin  2ri7tx)/n  = j — 7tx,  a relation  that 
Euler  knew  ([85']  and  [88,  part  2,  §92]).  Integrating  again  yields  the  desired 
formula.  (This  solution  was  suggested  by  E.  M.  E.  Wermuth;  Euler’s  original 
derivation  did  not  meet  modern  standards  of  rigor.) 

9.65  The  expected  number  of  distinct  elements  in  the  sequence  1 , f ( I ) , 
f(f(1))i  •••)  when  f is  a random  mapping  of  {1,2, . . . , n)  into  itself,  is  the 
function  Q(n)  of  exercise  56,  whose  value  is  | \/27tn-(-0  ( 1 ) ; this  might  account 
somehow  for  the  factor  \/l Ttu. 

9.66  It  is  known  that  lnXn  ~ |n2  In  the  constant  e_7t26  has  been  verified 
empirically  to  eight  significant  digits. 

9.67  This  would  fail  if,  for  example,  eu~Y  = m+  j + e/m  for  some  integer  m 
and  some  0 < e<  but  no  counterexamples  are  known. 
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THE  EXERCISES  in  this  book  have  been  drawn  from  many  sources.  The 
authors  have  tried  to  trace  the  origins  of  all  the  problems  that  have  been 
published  before,  except  in  cases  where  the  exercise  is  so  elementary  that  its 
inventor  would  probably  not  think  anything  was  being  invented. 

Many  of  the  exercises  come  from  examinations  in  Stanford’s  Concrete 
Mathematics  classes  The  teaching  assistants  and  instructors  often  devised 
new  problems  for  those  exams,  so  it  is  appropriate  to  list  their  names  here: 
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Henson  Graves,  Louis  Jouaillec 
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1975 
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1976 

Andy  Yao 
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same  TAs  next  year. 

1977 
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Yossi  Shiloach 

1978 

Frances  Yao 

Yossi  Shiloach 

C/ass  notes  very 
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1979 
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Frank  Liang,  Chris  Tong,  Mark  Haiman 
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ling  numbers. 
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In  addition,  David  Klamer  (1971),  Bob  Sedgewick  (1974),  Leo  Guibas  (1975), 
and  Lyle  Ramshaw  (1979)  each  contributed  to  the  class  by  giving  six  or  more 
guest  lectures.  Detailed  lecture  notes  taken  each  year  by  the  teaching  assis- 
tants and  edited  by  the  instructors  have  served  as  the  basis  of  this  book. 
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3.46  Graham  and  Poliak  [132]. 
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3.51  Fraenkel  f 103], 

3.52  S.  K.  Stein.* 

4.4  [180,  §526], 

4.16  Sylvester  [283], 

4.19  Bertrand  [23,  p.  129];  Chebyshev  [50]; 
Wright  [309]. 
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4.22  Brillhart  [34];  Williams  and  Dub- 
ner  [305]. 
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4.24  Legendre  [196,  second  edition, 
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4.40  Stickelberger  [280]. 
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4.56  Logan  [202,  eq.  (6.15)]. 
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4.57  A special  case  appears  in  [182]. 

4.58  Sierpinski  [266]. 

4.59  Curtiss  [59];  Erdos  [76], 

4.60  Mills  [216]. 
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4.63  Barlow  [17];  Abel  [1], 
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4.67  [127], 

4.69  Cramer  [56]. 

4.70  P.  Erdos.* 

4.71  [77,  p.  96], 
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5. 48  Ranjan  Roy.  * 

5.49  Roy  [255,  eq.  3.13], 

5.53  Gauss  [116];  Richard  Askey.* 

5.58  Frazer  and  McKellar  [107]. 

5.59  Stanford  Computer  Science  Compre- 
hensive Exam,  Winter  1987. 

5.60  [173,  exercise  1.2.  6-41]. 

5.61  Lucas  [206]. 

5.62  1971  midterm. 


5. 63  1974  midterm. 

5. 64  1980  midterm. 

5. 65  1983  midterm. 

5. 66  1984  midterm. 

5. 67  1976  midterm. 

5. 68  1985  midterm. 

5. 69  Lyle  Ramshaw,  guest  lecture  in  1986. 

5.70  Andrews  [9,  theorem  5.4], 

5.71  H.  S.  Wilf  [304',  exercise  4.16]. 

5.72  Hermite  [154]. 

5. 74  1979  midterm. 

5.75  1971  midterm. 

5.76  [173,  exercise  1.2.6-59  (corrected)]. 

5.77  1986  midterm. 

5.78  [176], 

5.79  Mendelsohn  [215];  Montgomery  [218], 

5.81  1986  final  exam. 

5.82  Hillman  and  Hoggatt  [157], 

5.85  Hsu  [159]. 

5.86  Good  [123], 

5.88  Hermite  [155] . 

5.91  Whipple  [301], 

5.92  Clausen  [51],  [52], 

5.93  Gosper  [124], 

5.94  Henrici  [152,  p.  118]. 

5.95  [77,  p.  71]. 

5.96  [77,  P-  71]. 

5.97  R.  William  Gosper,  Jr.* 

6.6  Fibonacci  [98,  p.  283], 

6.15  [175,  exercise  5. 1.3-2]. 

6.21  Theisinger  [287]. 

6.25  Gardner  [112]  credits  Denys  Wilquin. 

6.27  Lucas  [205]. 

6.28  Lucas  [207,  chapter  18]. 

6.31  Lah  [193];  R.  W.  Floyd.* 

6. 35  1977  midterm. 

6.37  Shallit  [263], 

6.39  [173,  exercise  1.2.7-15], 

6.40  Klamkin  [169,  problem  1979/1]. 

6. 41  1973  midterm. 

6.43  Brooke  and  Wall  [36]. 

6.44  Matiiasevich  [213]. 
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6.46  Francesca  [106];  Wallis  [295,  chap- 
ter 4]. 

6.47  Lucas  [205], 

6.48  [174,  exercise  4.5.3-9(c)]. 

6.49  Davison  [61]. 

6.50  1985  midterm;  Rham  [248];  Dijk- 

stra  [66,  pp.  230-232]. 

6.51  Waring  [296];  Lagrange  [191];  Wol- 
stenholme  [306]. 

6.52  Eswarathasan  and  Levine  [79], 

6.53  Kaucky  [168]  treats  a special  case. 

6.54  Staudt  [276];  Clausen  [53];  Rado  [242]. 

6.55  Andrews  and  Uchimura  [ 12]. 

6.56  1986  midterm. 

6.51  1984  midterm,  suggested  by  R.  W. 

Floyd.  * 

6.58  [173,  exercise  1.2.8-30];  1982  midterm. 

6.59  Burr  [42], 

6.61  1976  final  exam. 

6.62  Borwein  and  Borwein  [31,  §3.7], 

6.63  [173,  section  1.2.10];  Stanley  [275, 
proposition  1.3.12]. 

6.65  Tanny  [286]. 

6.66  Logan  [202']. 

6.67  [175,  exercise  6.1-13], 

6.70  Euler  [88,  part  2,  chapter  8], 

6.72  [175,  exercise  5. 1.3-3]. 

6.73  Euler  [86,  chapters  9 and  10]; 

Schroter  [260], 

6.74  Logan  [202']. 

6.75  Comic  section,  Boston  Herald, 

August  21,  1904. 

6.76  Silverman  and  Dunn  [268]. 

6.78  [183], 

6.79  [126],  modulo  a numerical  error. 

6.80  [174,  exercises  4. 5. 3-2  and  3], 

6.81  Adams  and  Davison  [3], 

6.84  Lehmer  [198], 

6.85  Burr  [42], 

6.87  Part  (a)  is  from  Eswarathasan  and 
Levine  [ 79], 

7.2  [173,  exercise  1.2. 9-1]. 


7.8  Zave  [311], 

7.9  [173,  exercise  1.2.7-22]. 

7.11  1971  final  exam. 

7.12  [175,  pp.  63-64], 

7.13  Raney  [243]. 

7.15  Bell  [20]. 

7.16  Polya  [237,  p.  149];  [173,  exercise 
2.3.4.4-1], 

7.20  Jungen  [167,  p.  299]  credits  A. 

Hurwitz. 

7.22  Polya  [239]. 

7.23  1983  homework. 

7.24  Myers  [222];  Sedlacek  [262]. 

7.25  [174,  Carlitz’s  proof  of  lemma  3.3.3B]. 

7.26  [173,  exercise  1.2.8-12]. 

7.32  [77,  pp.  25-26]  credits  L.  Mirsky  and 
M.  Newman. 

7.33  1971  final  exam. 

7.34  Tomas  Feder  . * 

7.36  1974  final  exam. 

7.37  Euler  [87,  §50];  1971  final  exam. 

7.38  1973  final  exam. 

7.39  [173,  exercise  1.2.9-18]. 

7.41  Andre  [8];  [175,  exercise  5.1.4-22], 

7.42  1974  final  exam. 

7.44  Gross  [136];  [175,  exercise  5. 3. 1-3]. 

7.45  de  Bruijn  [63], 

7.47  Waugh  and  Maxfield  [297]. 

7.48  1984  final  exam. 

7.48  Waterhouse  [296']. 

7.50  Schroder  [259];  [173,  exercise  2.3.4.4- 
311. 

7.51  Fisher  [99];  Percus  [232,  pp.  89-123]; 
Stanley  [274], 

7.52  Hammersley  [ 146], 

7.53  Euler  [92,  part  2,  section  2,  chapter  6, 

§91]- 

7.54  Moessner  [217]. 

7.55  Stanley  [273], 

7.56  Euler  [91], 

7.57  [77,  p.  48]  credits  P.  Erdos  and 
P.  Turan. 


C CREDITS  FOR  EXERCISES  605 


8.13  Thomas  M.  Cover.* 

8.15  [173,  exercise  1.2.10-17]. 

8.17  Patil  [228]. 

8.24  John  Knuth  (age  4)  and  DEK;  1975 
final. 

8.26  [173,  exercise  1.3.  3-18], 

8.27  Fisher  [100]. 

8.29  Guibas  and  Odlyzko  [138]. 

8.32  1977  final  exam. 

8.34  Hardy  [149]  has  an  incorrect  analysis 
leading  to  the  opposite  conclusion. 

8.35  1981  final  exam. 

8.36  Gardner  [113]  credits  George  Sicher- 
man. 

8.38  [174,  exercise  3.3.2-10]. 

8.39  [177,  exercise  4.3(a)]. 

8.41  Feller  [96,  exercise  1X.331. 

8.43  [173,  sections  1.2.10  and  1.3.3]. 

8.44  1984  final  exam. 

8.46  Feller  [96]  credits  Hugo  Steinhaus. 

8.47  1974  final,  suggested  by  “fringe 
analysis”  of  2-3  trees. 

8.48  1979  final  exam. 

8.49  Blom  [26];  1984  final  exam. 

8.50  1986  final  exam. 

8.51  1986  final  exam. 

8.53  Feller  [96]  credits  S.  N.  Bernstein. 

8.57  Lyle  Ramshaw.* 

8.63  Guibas  and  Odlyzko  [138]. 

9.1  Hardy  [148,  1.3(g)]. 

9.2  Part  (c)  is  from  Garfunkel  [114], 

9.3  [173,  exercise  1.2.11.1-6], 

9.6  [173,  exercise  1.2.11.1-3], 

9.8  Hardy  [148,  1.2(iv)]. 

9.9  Landau  [194,  vol.  1,  p.  60]. 

9.14  [173,  exercise  1.2.11.3-6], 

9.16  Knopp  [170,  edition  ^ 2,  §64C]. 

9.18  Bender  [21,  §3.1]. 

9.20  1971  final  exam. 

9.24  [134,  §4.1.6]. 

9.27  Titchmarsh  [289]. 

9.28  [173,  exercise  1.2.11.2-7]. 


9.29  de  Bruijn  [62,  section  3.7]. 

9.32  1976  final  exam. 

9.34  1973  final  exam. 

9.35  1975  final  exam. 

9.36  1980  class  notes. 

9.37  [174,  eq.  4.5.3-21], 

9.38  1977  final  exam. 

9.39  1975  final  exam,  inspired  by 

Reich  [247], 

9.40  1977  final  exam. 

9.41  1980  final  exam. 

9.42  1979  final  exam. 

9.44  Tricomi  and  Erdelyi  [290], 

9.46  de  Bruijn  [62,  §6.3]. 

9.47  1980  homework;  [175,  eq.  5.3.1-34], 

9.48  1980  final  exam. 

9.49  1974  final  exam. 

9.50  1984  final  exam. 

9.51  [134,  §4.2.1]. 

9.52  Poincare  [235];  Borel  [30,  p.  27]. 

9.53  Polya  and  Szego  [240,  part  1,  problem 
140], 

9.57  Andrew  M.  Odlyzko.* 

9.58  Henrici  [151,  exercise  4.9.8]. 

9.60  Ilan  Vardi.* 

9.62  Canfield  [43]. 

9.63  Ilan  Vardi.* 

9.65  M.  P.  Schiitzenberger.* 

9.66  Lieb  [201'];  Stanley  [275,  exercise 
4.37(c)], 

9.67  Boas  and  Wrench  [27], 

* Unpublished  personal  communication. 
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Aaronson,  Bette  Jane,  ix. 

Abel,  Niels  Henrik,  578,  603. 

Abramowitz,  Milton,  42,  578. 

Absolute  convergence,  60-61,  64. 

Absolute  error,  438,  441. 

Absolute  value  of  complex  number,  64. 
Absorption  identities,  157-158,  247. 

Acton,  John  Emerich  Edward  Dalberg, 
baron,  66. 

Adams,  William  Wells,  578,  604. 
Addison-Wesley,  ix. 

Addition  formula,  158-159,  245,  247. 

Aho,  Alfred  Vaino,  578,  602. 

Ahrens,  Wilhelm  Ernst  Martin  Georg,  8, 
578,  602. 

Akhiezer,  Naum  Il’ich,  578. 

Alfred  [Brousseau],  Brother  Ulbertus,  580, 
602. 

Algebraic  integers,  147. 

Algorithms,  analysis  of,  138,  399-412. 
divide  and  conquer,  79. 

Euclid’s,  103,  123,  289-290. 

Fibonacci’s,  95,  101. 

Gosper’s,  224-226,  519. 
greedy,  101,  281. 
self-certifying,  104. 

Alice,  31,  394-396,  416. 

Allardice,  Robert  Edgar,  2,  5’78. 


American  Mathematical  Society,  viii. 

AMS  Euler,  ix,  625. 

Analysis  of  algorithms,  138,  399-412. 
Analytic  functions,  196. 

Ancestor,  117,  277. 

Andre,  Antoine  Desire,  578,  604. 

Andrews,  George  W.  Eyre,  215,  316,  515, 
579,  603,  604. 

Answers,  notes  on,  viii,  483,  606. 
Anti-difference  operator,  48,  54,  456-457. 
Approximation,  8,  76,  87-89,  110,  114, 
425482. 

of  sums  by  integrals,  45,  262-263, 
455461. 

Archibald,  Raymond  Clare,  581. 

Argument  of  hypergeometric,  205. 
Arithmetic  progression,  26,  30,  362. 
Armageddon,  85. 

Armstrong,  Daniel  Louis  (=  Satchmo),  80. 
Ascents,  253-254,  256. 

Askey,  Richard  Allen,  603. 

Associative  law,  30,  61,  64. 

Asymptotics,  8,  76,  110,  114,  425482. 

for  sums,  87-89,  452-482. 

Atkinson,  Michael  David,  579,  602. 

Austin,  A.  K.,  581. 

Automaton,  39 1 . 

Automorphic  numbers,  505. 
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Average,  defined,  370. 
of  a reciprocal,  418. 
variance,  409-411. 

Bachmann,  Paul  Gustav  Heinrich,  429,  448, 
579. 

Bailey,  Wilfrid  Norman,  223,  579,  603. 

Ball,  Walter  William  Rouse,  579,  602. 

Banach,  Stefan,  419. 

Barlow,  Peter,  579,  603. 

Barton,  David  Elliott,  577,  582. 

Baseball,  73,  148,  195,  616,  620,  621. 

BASIC,  173,  432. 

Basic  fractions,  134,  138. 

Basis  of  induction,  3,  10-11,  306-307. 
Bateman,  Harry,  595. 

Baum,  Lyman  Frank,  556. 

Beatty,  Samuel,  579,  602. 

Bee  trees,  277. 

Beeton,  Barbara  Ann  Neuhaus  Friend  Smith, 
viii. 

Bell,  Eric  Temple,  318,  579,  604. 

numbers,  359,  479. 

Bender,  Edward  Anton,  579,  605. 

Bernoulli,  Jakob  (=  Jacobi  = Jacques  = 
James),  269,  456,  579. 
numbers,  see  Bernoulli  numbers, 
polynomials,  353,  456-458. 
trials,  388,  see  Coins,  flipping. 

Bernoulli,  Johann  (=  Jean),  593. 

Bernoulli  numbers,  269-276,  301,  303,  353, 
456. 

calculation  of,  274. 

generalized,  see  Stirling  polynomials, 
generating  function  for,  271,  337,  351. 
Bernshtein  (=  Bernstein),  Sergei  Natanovich, 
605. 

Bertrand,  Joseph  Louis  Frangois,  145,  579, 
602. 

postulate,  145,  487,  528. 

Bessel,  Friedrich  Wilhelm,  function,  206, 

512. 


Beyer,  William  Hyman,  579. 

Biased  coin,  387. 

Bicycle,  246,  486. 

Bieberbach,  Ludwig,  589. 

Bienayme,  Irenee  Jules,  580. 

Big  Ell  notation,  430. 

Big  Oh  notation,  76,  429-435. 

Big  Omega  notation,  434. 

Big  Theta  notation,  434. 

Bijection,  39. 

Bill,  394-396,  416. 

Binary  logarithm,  70. 

Binary  notation  (radix-z),  11-13,  15,  70, 

113. 

Binary  partitions,  363. 

Binary  search,  121,  183. 

Binary  trees,  117. 

Binet,  Jacques  Philippe  Marie,  285,  289,  580. 
Binomial  coefficients,  153-242. 

combinatorial  interpretation,  153,  158, 

160,  169-170. 
definition,  154,  211. 
dual,  515. 
indices  of,  154. 
middle,  187,  242. 
reciprocal  of,  188. 
top  ten  identities  of,  174. 
wraparound,  238  (exercise  75),  301. 
Binomial  convolution,  351,  353. 

Binomial  distribution,  387-388,  401,  414, 

418. 

negative,  388-389,  414. 

Binomial  number  system,  234. 

Binomial  series,  generalized,  200-204,  232, 
240,  349. 

Binomial  theorem,  162-163,  199,  206,  221. 
Blom,  Gunnar,  580,  605. 

Bloopergeometric  series,  232. 

Boas,  Ralph  Philip,  Jr.,  viii,  574,  580,  605. 
Boggs,  Wade  Anthony,  195. 

Bohl,  Piers  Paul  Felix,  87,  580. 
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Bois-Reymond,  Paul  David  Gustav  du,  426, 
580,  589. 

Boncompagni,  Prince  Baldassarre,  585. 
Bootstrapping,  449-452. 

Borchardt,  Carl  Wilhelm,  589. 

Borel,  Emile  Felix  Edouard  Justin,  580,  605. 
Borwein,  Jonathan  Michael,  580,  604. 
Borwein,  Peter  Benjamin,  5130,  604. 

Bound  variables,  22. 

Boundary  conditions,  24-25,  75,  86,  159. 
Bowling,  6. 

Box  principle,  95,  130,  497. 

Brahma,  Tower  of,  1,  4,  264. 

Brent,  Richard  Peirce,  292,  510,  540,  580. 
Bricks,  299,  360. 

Brillhart,  John  David,  580,  602. 

Brocot,  Achille,  116,  580. 

Broder,  Andrei  Zary,  ix,  601. 

Brooke,  Maxey,  580,  603. 

Brousseau,  Brother  Alfred,  580,  602. 

Brown,  Mark  Robbin,  601. 

Brown,  Morton,  487,  580. 

Brown,  Roy  Howard,  ix. 

Brown,  Thomas  Craig,  581,  602. 

Brown,  Trivial,  581. 

Brown,  William  Gordon,  344,  581. 

Brown  University,  ix. 

Browning,  Elizabeth  Barrett,  306. 

Bubblesort,  434. 

Buckholtz,  Thomas  Joel,  593.. 

Burr,  Stefan  Andrus,  581,  604. 

Calculators,  67,  330. 

Calculus,  vi,  33. 

finite  and  infinite,  47-56. 

Candy,  36. 

Canfield,  Earl  Rodney,  577,  581,  605. 

Cards,  shuffling,  423. 

stacking,  259-260,  295. 

Carlitz,  Leonard,  604. 

Carroll,  Lewis  (=  Dodgson,  Rev.  Charles 
Lutwidge),  31,  279,  581,  582,  599. 


Carry,  70,  233,  283,  537. 

Cassini,  Jean  Dominique,  278,  581. 

identity,  278-279,  286,  289,  296,  300. 
Catalan,  Eugene  Charles,  203,  347,  581. 
Catalan  numbers,  181,  203,  303. 

combinatorial  interpretations,  344-346, 
541. 

generalized,  347. 
table  of  identities,  203. 

Cauchy,  Augustin  Louis,  581,  602. 

inequality,  64. 

Cech,  Eduard,  vi. 

Ceiling  function,  67-69. 

Center  of  gravity,  259-260. 

Certificate  of  correctness,  104. 

Chace,  Arnold  Buffum,  581,  602. 
Chaimovich,  M.,  581. 

Chain  rule,  54,  469. 

Change,  313-316,  360. 

large  amounts  of,  330-332,  478. 

Changing  the  index  of  summation,  30-31, 
39. 

Changing  the  tails  of  a sum,  452-455. 
Cheating,  viii,  158,  309,  374,  387. 
Chebyshev,  Pafnutil  L’vovich,  38,  145,  581, 
602. 

inequality,  376-377,  414,  416,  555. 
summation  inequalities,  38. 

Cheese  slicing,  19. 

Chen,  Pang-Chieh,  601. 

Chinese  Remainder  Theorem,  126,  146. 

Chu  Shih-Chieh,  169. 

Chung,  Fan-Rong  King,  ix. 

Clausen,  Thomas,  582,  603,  604. 

product  identities,  241. 

Clearly,  clarified,  403,  556. 

Cliches,  166,  310. 

Closed  form,  3,  7,  108,  317,  548. 

Closed  interval,  73-74. 

Cobb,  Tyrus  Raymond,  195. 
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Coins,  313-316. 
biased,  387. 
fair,  387,  416. 
flipping,  387-396. 
spinning,  387. 

Collingwood,  Stuart  Dodgson,  279,  582. 

Collins,  John,  594. 

Colombo,  Cristoforo  (=  Columbus,  Christo- 
pher), 74. 

Colors,  482. 

Columbia  University,  ix. 

Combinations,  153. 

Common  logarithm,  435. 

Commutative  law,  30,  61,  64,  308. 
relaxed,  31. 

Complete  graph,  354. 

Complex  factorial  powers,  211. 

Complex  numbers,  64. 
roots  of  unity,  149,  204,  361,  530,  550,  572. 

Composite  numbers,  105. 

Composition  of  generating  functions,  41.4. 

Concrete  Math  Club,  74. 

Concrete  mathematics,  defined,  vi. 

Conditional  convergence,  59. 

Conditional  probability,  402-405,  410-411. 

Confluent  hypergeometric  series,  206. 

Congruences,  124-126. 

Connection  Machine,  131. 

Contiguous  hypergeometrics,  514. 

Continuants,  287-295,  298,  300,  487. 

Continued  fractions,  287,  290-295,  304,  540. 

Convergence,  206,  317,  517. 
absolute,  60-61,  64. 
conditional,  59. 

Convex  regions,  5,  20,  483. 

Convolution,  197,  319,  339-350. 
binomial,  351,  353. 
identities  for,  202,  258. 

Conway,  John  Horton,  396,  566,  582. 

Cotangent  function,  272,  303. 


Counting,  combinations,  153. 
cycle  arrangements,  247-248. 
derangements,  193-196,  199-200. 
with  generating  functions,  306-316. 
integers  in  intervals,  73-74. 
necklaces,  139-141. 
parenthesized  formulas,  343-345. 
permutations,  111,  253-254. 
set  partitions,  245. 
spanning  trees,  335,  354. 

Coupon  collecting,  558. 

Cover,  Thomas  Merrill,  605. 

Coxeter,  Harold  Scott  Macdonald,  579. 

Cramer,  Carl  Harald,  510,  582,  603. 

Cray  X-MP,  109. 

Crelle,  August  Leopold,  582,  602. 

Cribbage,  65. 

Crispin,  Mark  Reed,  598. 

Crowe,  Donald  Warren,  582,  602. 

Crudification,  433. 

Cubes,  sum  of  consecutive,  51,  63,  269,  275, 
353. 

Cumulants,  383-387,  414,  415,  424. 

CUNY  (=  City  University  of  New  York),  ix. 

Curtiss,  David  Raymond,  582,  603. 

Cycles,  139,  245,  248,  486. 

Cyclic  shift,  12. 

Cyclotomic  polynomial,  149. 

6,  see  Finite  calculus. 

A,  see  Difference  operator. 

D,  see  Derivative  operator. 

David,  Florence  Nightingale,  577,  582. 

Davison,  John  Leslie,  293,  578,  582,  604. 

de  Branges,  Louis,  589. 

de  Bruijn,  Nicolaas  Govert,  430,  433,  486, 
582,  604,  605. 
cycle,  486. 

de  Moivre,  Abraham,  283,  467,  582. 

Definite  sums,  analogous  to  definite  inte- 
grals, 49-50. 


610  INDEX 


Degenerate  hypergeometric  series,  210,  216, 
222,  235. 

Derangements,  193-196,  199-200,  379-380, 
386-387,  414. 

Derivative  operator,  33,  47,  1191,  219-221, 
296,  319,  350-351,  456457. 

Descents,  see  Ascents. 

dgf:  Dirichlet  generating  function. 

Dice,  367-370,  413,  415. 
fair,  368,  403. 
loaded,  368,  413. 
nonstandard,  417. 
supposedly  fair,  378. 

Dickson,  Leonard  Eugene,  496,  583. 
Dieudonne,  Jean  Alexandre,  500. 

Difference  operator,  47-55,  456-457. 
nth  order,  187-192. 

Differentiably  finite  power  series,  360,  366. 
Differential  operators,  see  Derivative 
operator  and  Theta  operator. 

Difficulty  measure  for  summation,  181. 
Dijkstra,  Edsger  Wybe,  173,  583,  604. 
Dimers  and  dimes,  306,  see  Dominoes  and 
Change. 

Diphages,  420,  424. 

Dirichlet,  Peter  Gustav  Lejeune,  356,  583, 
602. 

box  principle,  95,  130,  497. 
generating  functions,  356-357,  359,  418, 
437. 

probability  generating  functions,  418. 
Discrepancy,  88-89,  97,  304,  478,  481. 
Discrete  probability,  367-424. 

defined,  367. 

Disease,  319. 

Distribution,  of  probabilities,  367. 

of  things  into  groups,  83-8.5. 

Distributive  law,  30,  35,  60,  64,  83. 
Divergent  sums,  60,  334,  517. 

Divide  and  conquer,  79. 

Divides  exactly,  112-114,  146,  233. 


Divisibility,  102-105. 

of  polynomials,  225. 

Dixon,  Alfred  Cardew,  583,  603. 

formula,  214. 

DNA,  Martian,  363. 

Dodgson,  Charles  Lutwidge,  see  Carroll. 
Dominoes,  306-313,  357. 

Double  sums,  34-41,  105,  237. 

Doubly  exponential  recurrences,  97,  100, 

101,  109. 

Doubly  infinite  sums,  59,  98,  468-469. 
Dougall,  John,  171,  583. 

Downward  generalization,  2,  95,  306-307. 
Doyle,  Sir  Arthur  Conan,  162,  227-228,  391, 
583. 

Drones,  277. 

Drysdale,  Robert  Lewis  (Scot),  III,  601. 
du  Bois-Reymond,  Paul  David  Gustav,  426, 
580,  589. 

Duality,  63  (exercise  17),  68-69,  253,  515. 
Dubner,  Harvey,  600,  602. 

Dudeney,  Henry  Ernest,  583,  602. 

Dunkel,  Otto,  586,  602. 

Dunn,  Angela  Fox,  597,  604. 

Dunnington,  Guy  Waldo,  583. 

Duplication  formulas,  186,  232. 

Dupre,  Lyn  Oppenheim,  ix. 

Durst,  Lincoln  Kearney,  viii. 

Dyson,  Freeman  John,  172,  587. 

e,  70,  122,  570. 

E,  55,  188,  191. 

Edwards,  Anthony  William  Fairbank,  583. 
Eeny-meeny-miny-mo,  see  Josephus  prob- 
lem. 

Efficiency,  24. 

egf:  Exponential  generating  function. 

Eggs,  158. 

Egyptian  mathematics,  95,  150,  581. 
Einstein,  Albert,  72,  293. 

Eisele,  Carolyn,  595. 
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Eisenstein,  Ferdinand  Gotthold  Max,  202, 

583. 

Elementary  events,  367-368. 

Elkies,  Noam  David,  131. 

Ellipsis  (•••),  21,  50,  108. 

Empirical  estimates,  377-379,  413. 

Empty  case,  2,  244,  306-307,  335,  541. 

Empty  product,  48,  106. 

Empty  sum,  23,  48. 

Entier  function,  see  Floor  function. 

Equality,  one-way,  432-433. 

Equivalence  relation,  124. 

Eratosthenes,  sieve  of.  111. 

Erdelyi,  Arthur,  599,  605. 

Erdos,  Pal  (=  Paul),  510,  526,  550,  583-584, 
603,  604. 

Error,  absolute  versus  relative,  438,  441. 

Error  function,  166. 

Eswarathasan,  Arulappah,  584,  604. 

Euclid  (=  Evk \o.6t]<,),  107-108,  584. 
algorithm,  103-104,  123,  289-290. 
numbers,  108,  145,  150,  151. 

Euler,  Leonhard,  i,  vii,  ix,  6,  48,  122,  131, 

133,  134,  205,  207,  210,  232,  253,  263, 

264,  272,  285,  287,  289,  455,  457,  499, 

514,  550,  577,  579,  584-585,  602-604. 
constant,  264,  292,  304,  467. 
identity  for  hypergeometrics,  233. 
numbers,  535,  591;  see  also  Eulerian 
numbers. 

polynomials,  549. 
summation  formula,  455-461. 
theorem,  133,  141,  147. 
totient  function,  133-135,  137-144,  357, 
448449. 

triangle,  254,  303. 

Eulerian  numbers,  253-257,  296,  302,  364, 

550. 

combinatorial  interpretations,  253-254,  534. 
generalized,  299. 
generating  function  for,  337. 
second-order,  256-257. 


Event,  368. 

Eventually  positive  function,  428. 

Exact  cover,  362. 

Exactly  divides,  112-114,  146,  233. 
Excedances,  302. 

Exercises,  levels  of,  viii,  72-73,  95,  497. 
exp:  Exponential  function,  441. 

Expectation,  see  Expected  value. 

Expected  value,  371-373,  381. 

Exponential  function,  discrete  analog  of,  54. 
Exponential  generating  functions,  350-355, 
407408. 

Exponential  series,  generalized,  200-202, 
231,  350,  355. 

Exponents,  law  of,  52. 

cj)(  see  Phi. 

cp,  see  Euler’s  totient  function. 

Factorial  expansion  of  binomial  coefficients, 
156. 

Factorial  function,  111-115,  332-334. 
approximation  to,  see  Stirling’s  formula, 
duplication  formula,  232. 
generalized  to  nonintegers,  192,  210-211, 
213-214,  302. 

Factorial  powers,  4748,  63,  248. 
complex,  211. 
negative,  52-53,  63. 

related  to  ordinary  powers,  248-249,  572. 
Factorization  into  primes,  106-107,  110. 
Factorization  of  summation  conditions,  36. 
Fair  coins,  387,  416. 

Fair  dice,  368,  403. 

Falling  factorial  powers,  47. 
complex,  211. 
difference  of,  48,  53. 
negative,  188. 

related  to  ordinary  powers,  51,  248-249, 
572. 

related  to  rising  powers,  63,  298. 

Fans,  ix,  193,  334. 

Farey,  John,  series,  118-119,  134,  137,  150, 
152,  448,  588. 
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Feder,  Tomas,  604. 

Feigenbaum,  Joan,  601. 

Feller,  William,  367,  585,  605. 

Fermat,  Pierre  de,  130,  131,  585. 

numbers,  131-132,  145,  510. 

Fermat’s  Last  Theorem,  130,  150,  509,  532. 
Fermat’s  theorem  (=  Fermat’s  Little 
Theorem),  131,  141,  149. 
converse  of,  148. 

Fibonacci,  Leonardo,  95,  278,  527,  585,  602, 
603. 

algorithm,  95,  101. 
factorial,  478. 

number  system,  282-283,  287,  293,  296, 
303. 

odd  and  even,  293-294. 

Fibonacci  numbers,  276-287,  288,  307,  317. 
combinatorial  interpretations  of,  277,  278, 
288,  307. 

generating  function  for,  283-285,  323-326, 
337. 

second -order,  361. 

Fine,  Henry  Burchard,  595. 

Fine,  Nathan  Jacob,  577. 

Finite  calculus,  47-56. 

Finite  state  language,  391. 

Finkel,  Raphael  Ari,  598. 

Fisher,  Michael  Ellis,  585,  604. 

Fisher,  Sir  Ronald  Aylmer,  586,  605. 

Fixed  point,  12,  379-380,  386-387,  414. 

Floor  function,  67-69. 

Floyd,  Robert  W,  603,  604. 

Food,  see  Candy,  Cheese,  Eggs,  Pizza, 
Sherry. 

Football,  182. 

Football  victory  problem,  193196,  199-200, 
414. 

generalized,  415. 

mean  and  variance,  379-380,  386-387. 
Forcadel,  Pierre,  586,  603. 

Formal  pavnriES,  206,  317,  517. 
FORTRAN,  432. 


Fourier,  Jean  Baptiste  Joseph,  22,  586. 
series,  481. 

Fractional  part,  70,  83,  87,  456. 

Fractions,  116-123,  151. 
basic,  134,  138. 

continued,  287,  290-295,  304,  540. 
partial,  see  Partial  fraction  expansions, 
unit,  95,  150. 
unreduced,  134-135,  151. 

Fraenkel,  Aviezri  S,  500,  535,  586,  602. 
Frame,  James  Sutherland,  586,  602. 
Francesca,  Piero  della,  586,  604. 

Fraser,  Alexander  Yule,  2,  578. 

Frazer,  William  Donald,  586,  603. 

Fredman,  Michael  Lawrence,  499,  586. 

Free  variables,  22. 

Freiman,  Grigori!  Abelevich,  581. 

Friendly  monster,  526. 

Frisbees,  420-421,  423. 

Frye,  Roger  Edward,  131. 

Fundamental  Theorem  of  Arithmetic, 
106-107. 

Fundamental  Theorem  of  Calculus,  48. 

Fuss,  Nicolai  Ivanovich,  347,  586. 

Fuss-Catalan  numbers,  347. 

Fuss,  Paul  Heinrich  von,  584. 

Y,  see  Euler’s  constant. 

P,  see  Gamma  function. 

Gale,  Dorothy,  556. 

Games,  see  Bowling,  Cards,  Cribbage,  Dice, 
Penny  ante,  Sports. 

Gamma  function,  210-214,  468,  513. 

Gardner,  Martin,  586,  603,  605. 

Garfunkel,  J.,  587,  605. 

Gaufi  (=  Gauss),  Karl  (=  Carl)  Friedrich, 
vii,  6,  7,  123,  205,  207,  212,  496,  514, 
583,  587,  602,  603. 

identity  for  hypergeometrics,  222,  235. 
trick,  6,  30,  112,  299. 
gcd:  Greatest  common  divisor. 
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Generalization,  11,  13,  16. 

downward,  2,  95,  306-307. 

Generalized  binomial  series,  200-204,  232, 
240,  349. 

Generalized  exponential  series,  200-202,  231, 
350,  355. 

Generalized  factorial  function,  192,  210-211, 
213-214,  302. 

Generalized  harmonic  numbers,  263,  269, 

272,  297,  302,  356. 

Generating  functions,  196-204,  283-285, 
306-366. 

for  Bernoulli  numbers,  271,  337,  351. 
for  convolutions,  339-350,  355,  407. 
Dirichlet,  356-357,  359,  418,  437. 
for  Eulerian  numbers,  337. 
exponential,  350-355. 
for  Fibonacci  numbers,  283-285,  323—326, 
337. 

of  generating  functions,  337,  339,  407. 
for  harmonic  numbers,  337-338. 

Newtonian,  364. 
for  probabilities,  380-387. 
for  simple  sequences,  321. 
for  Stirling  numbers,  337,  407. 
super,  339,  407. 

Genocchi,  Angelo,  587. 
numbers,  528,  549. 

Geometric  progression,  32-33,  54,  114, 
205-206. 

Gessel,  Ira  Martin,  256,  587. 

Gibbs,  Josiah  Willard,  599. 

Gilbert,  William  Schwenck,  430. 

Ginsburg,  Jekuthiel,  587. 

Glaisher,  James  Whitbread  Lee,  constant, 
569. 

God,  1,  293. 

Goldbach,  Christian,  584. 

theorem,  66. 

Golden  ratio,  285. 

Golf,  417. 


Golomb,  Solomon  Wolf,  446,  493,  587,  602. 

self-describing  sequence,  66,  481. 

Good,  Irving  John,  587,  603. 

Goodfellow,  Geoffrey  Scott,  598. 

Gopinath,  Bhaskarpillai,  487,  592. 

Gordon,  Peter  Stuart,  ix. 

Gosper,  Ralph  William,  Jr.,  224,  487,  540, 
587,  603. 

algorithm,  224-226,  519. 
algorithm,  examples,  227-228,  233,  519. 
goto,  considered  harmful,  173. 

Gottschalk,  Walter  Helbig,  vii. 

Graffiti,  vii,  ix,  59,  606. 

Graham,  Cheryl,  ix. 

Graham,  Ronald  Lewis,  iii,  iv,  vi,  ix,  102, 
492,  582-584,  587-588,  598,  601,  602. 
Grandi,  Luigi  Guido,  58,  588. 

Graph,  334,  360. 

Graves,  William  Henson,  601. 

Gravity,  center  of,  259-260. 

Gray,  Frank,  code,  483. 

Greatest  common  divisor,  92,  103-104,  107, 
145. 

Greatest  integer  function,  see  Floor  func- 
tion. 

Greatest  lower  bound,  65. 

Greed,  74,  373-374;  see  also  Rewards. 
Greedy  algorithm,  101,  281. 

Green,  Research  Sink,  581. 

Greene,  Daniel  Hill,  588. 

Greitzer,  Samuel  Louis,  588,  602. 

Gross,  Oliver  Alfred,  588,  604. 

Griinbaum,  Branko,  484,  588. 

Grundy,  Patrick  Michael,  597,  602. 

Guibas,  Leonidas  Ioannis  (=  Leo  John),  588, 
601,  605. 

Guy,  Richard  Kenneth,  500,  510,  588. 

Haar,  Alfred,  vii. 

Hacker’s  Dictionary,  124,  598. 

Haiman,  Mark,  601. 

Half-open  interval,  73-74. 
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Hall,  Marshall,  Jr.,  588. 

Halmos,  Paul  Richard,  v,  vi,  588. 

Halphen,  Georges  Henri,  291,  588. 

Halving,  79,  186-187. 

Hamburger,  Hans  Ludwig,  566,  589. 
Hammersley,  John  Michael,  v,  589,  604. 
Hanoi,  Tower  of,  1-4,  26-27,  109,  146. 

variationson,  17-19. 

Hansen,  Eldon  Robert,  42,  589. 

Hardy,  Godfrey  Harold,  111,  428,  589,  602, 
605. 

Harmonic  numbers,  29,  258-268,  466. 
analogous  to  logarithms,  53. 
approximate  values  of,  262—264. 
complex,  297,  302. 
divisibility  of,  297,  300,  304. 
generalized,  263,  269,  272,  297,  302,  356. 
generating  function  for,  337-338. 
second-order,  263,  266,  297,  529. 
sums  of,  41,  56,  265-268,  298-299,  302, 
340-341. 

Harmonic  series,  divergence  of,  62,  262. 
Harry,  Matthew  Arnold,  double  sum,  237. 
Hashing,  397-412. 

Hats,  193-196,  199-200,  379-380,  386-387, 
414,  415. 
hcf,  103. 

Heath-Brown,  David  Rodney,  599. 

Heiberg,  Johan  Ludvig,  584. 

Heisenberg,  Werner  Karl,  467.. 

Helmbold,  David  Paul,  601. 

Henrici,  Peter  Karl  Eugen,  318,  526,  576, 
589,  603,  605. 

Hermite,  Charles,  524,  532,  589,  603. 
Herstein,  Israel  Nathan,  8,  589. 

Hexagon  property,  155,  230,  239. 

Hillman,  Abraham  P,  589,  603. 

Hoare,  Charles  Antony  Richard,  28,  73,  589. 
Hofstadter,  Douglas  Richard,  602. 

Hoggatt,  Vemer  Emil,  Jr.,  589,  593,  603. 
Holden,  Edward  Singleton,  595. 

Holmboe,  Bemdt  Michael,  578. 


Holmes,  Thomas  Sherlock  Scott,  162, 

227-228. 

Holomorphic  functions,  196. 

Horses,  17,  454,  489. 

Hsu,  Lee-Tsch  (=  Lietz  = Leetch) 

Ching-Siur,  589,  603. 

Hurwitz,  Adolf,  604. 

Hyperbolic  functions,  271-272. 

Hyperfactorial,  231,  477. 

Hypergeometric  series,  204-223. 
degenerate,  210,  216,  222,  235. 
differential  equation  for,  219-221. 
partial  sums  of,  165-166,  223-230,  233. 
transformations  of,  216-223,  235,  241. 
Hypergeometric  terms,  224,  231,  233. 

i,  22. 

3:  Imaginary  part,  64. 

Implicit  recurrences,  136-138,  193-194,  270. 
Indefinite  summation,  48-49,  55-56,  161, 
224-230. 

Independent  random  variables,  370,  413,  423. 
Index  set,  22,  30,  61. 

Index  variable,  22,  34,  60. 

Induction,  3,  7,  10-11,  17,  43. 
backwards,  18. 
basis  of,  3,  306-307. 
failure  of,  550. 

important  lesson  about,  494,  526. 

Inductive  leap,  4,  43. 

Inequality,  Cauchy’s,  64. 

Chebyshev’s,  376-377,  414,  416,  555. 
Chebyshev’s  summation,  38. 

Infinite  sums,  56-62,  64. 

Information  retrieval,  397-399. 

Inkeri,  Kustaa,  509,  590. 

INT  function,  67. 

Integer  part,  70. 

Integration,  45-46,  48,  319,  351. 
by  parts,  54,  458. 

Interchanging  the  order  of  summation, 

34-41,  105,  136,  183,  185. 


INDEX  615 


Interpolation,  191-192. 

Intervals,  73-74. 

Invariant  relation,  117. 

Inverse  modulo  m,  125,  132,  147. 

Inversion  formulas,  136,  138,  192-193. 
Irrational  numbers,  87,  122-123. 

Iverson,  Kenneth  Eugene,  24,  67,  590,  602. 
convention,  24,  31,  34,  68,  75,  587. 

Jacobi,  Carl  Gustav  Jacob,  64,  590. 
Jarden,  Dov,  533,  590. 

Jeopardy,  347. 

Joint  distribution,  370. 

Jonassen,  Arne  Tormod,  590. 

Jones,  Bush,  590. 

Josephus,  Flavius,  8,  12,  19-20,  590. 
numbers,  81,  97,  100. 
problem,  8-17,  79-81,  95,  100,  144. 
recurrence,  generalized,  13-16,  79-81. 
subset,  20. 

Jouaillec,  Louis  Maurice,  601. 

Jungen,  R.,  590,  604. 

Kafkaesque  scenario,  260. 

Kaplansky,  Irving,  8,  589. 

Karlin,  Anna  Rochelle,  601. 

Kaucky,  Josef,  590,  604. 

Kellogg,  Oliver  Dimon,  582. 

Kent,  Clark  (=  Kal-El),  358. 

Kernel  functions,  356. 

Ketcham,  Henry  King,  148. 

Kilometers,  287,  296. 

Kilroy,  James  Joseph,  vii. 

Kipling,  Joseph  Rudyard,  246. 

Kissinger,  Henry  Alfred,  365. 

Klamkin,  Murray  Seymour,  590,  602,  603. 
Klarner,  David  Anthony,  601. 

Knockout  tournament,  418-419. 

Knopp,  Konrad,  590,  605. 

Knuth,  Donald  Ervin,  iii-vi,  viii,  ix,  102, 
253,  397,  492,  531,  588,  590-591,  601, 
605,  625. 

numbers,  78,  97,  100. 


Knuth,  John  Martin,  605. 

Knuth,  Nancy  Jill  Carter,  ix. 

Kramp,  Christian,  111,  591. 

Kronecker,  Leopold,  delta  notation,  24. 
Kummer,  Ernst  Eduard,  206,  514,  591-592, 
603. 

formula  for  hypergeometrics,  213,  217. 
Kurshan,  Robert  Paul,  487,  592. 

A-notation,  65. 

Lagny,  Thomas  Fantet  de,  290,  592. 

Lagrange  (=  de  la  Grange),  Joseph  Louis, 
comte,  592,  604. 
identity,  64. 

Lah,  Ivo,  592,  603. 

Landau,  Edmund  Georg  Hermann,  429,  434, 
592,  603,  605. 

Laplace,  Pierre  Simon,  marquis  de,  452,  580, 
592. 

Last  but  not  least,  132,  455. 

Law  of  Large  Numbers,  377. 
lcm:  Least  common  multiple,  103. 

Least  common  multiple,  103,  107. 

Least  integer  function,  see  Ceiling  function. 
Least  upper  bound,  57,  61. 

LeChiflre,  Mark  Well,  148. 

Left-to-right  maxima,  302. 

Legendre,  Adrien  Marie,  548,  592,  602. 
Lehmer,  Derrick  Henry,  592,  602,  604. 

Leibniz,  Gottfried  Wilhelm,  Freiherr  von,  vii, 
168,  588,  593. 

Lekkerkerker,  Cornelius  Gerrit  , 593. 

Levels  of  exercises,  viii,  72-73,  95,  497. 
Levine,  Eugene,  584,  604. 

Lexicographic  order,  427. 
lg:  Binary  logarithm,  70. 

L’Hospital,  Guillaume  Frangois  Antoine  de, 
marquis  de  Sainte  Mesme,  rule,  326, 

382. 

Liang,  Franklin  Mark,  601. 

Lieb,  Elliott  Hershel,  593,  605. 

Lies,  and  statistics,  195. 
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Lincoln,  Abraham,  387. 

Lines  in  the  plane,  4-8,  17,  19. 

Little  oh  notation,  434. 

In:  Natural  logarithm,  262. 
log:  Common  logarithm,  435. 

Logan,  Benjamin  Franklin  (=  Tex),  Jr.,  273, 
593,  602-604. 

Logarithmico-exponential  functions, 

428429. 

Logarithms,  53-54,  70,  262,  435. 

Long,  Calvin  Thomas,  593,  1603. 

Lottery,  373-374,  422423. 

Lower  index,  154. 

Lower  parameters,  205. 

Loyd,  Samuel,  536,  593. 

Lucas,  Frangois  Edouard  Anatole,  1,  278, 

593,  602-604. 
numbers,  298,  302. 

Lyness,  Robert  Cranston,  487,  593,  602. 
Lytton,  Edward  George  Earle  Lytton 
Bulwer,  baron,  v. 

p.,  see  Mobius  function. 

Maclaurin,  Colin,  455,  593. 

MacMahon,  Maj.  Percy  Alexander,  140,  593. 
MACSYMA,  42,  525. 

Magic  tricks,  279. 

Mallows,  Colin  Lingwood,  492. 

Markov,  Andrei  Andreevich  (the  elder), 
processes,  391. 

Martian  DNA,  363. 

Mathematical  induction,  3,  7,  10-11,  17,  43. 
backwards,  18. 

basis  of,  3,  306-307. 
failure  of,  550. 

important  lesson  about,  494,  526. 

Mathews,  Edwin  Lee  (=  41),  8,  21,  94,  105, 
106,  329. 

Matiiasevich  (=  Matijasevich),  Iuril  (=  Yuri) 
Vladimirovich,  280,  593,  603. 

Maxfield,  Margaret  Waugh,  599,  604. 

Mayr,  Ernst,  ix,  601,  602. 

McEliece,  Robert  James,  71. 


McGrath,  James  Patrick,  601. 

McKellar,  Archie  Charles,  586,  603. 

Mean  (average)  of  a probability  distribution, 
370-381. 

Median,  370,  371,  423. 

Mediant,  116. 

Melzak,  Zdzislaw  Alexander,  vi,  594. 
Mendelsohn,  Nathan  Saul,  594,  603. 
Merchant,  Arif  Abdulhussein,  601. 

Merging,  79,  175. 

Mersenne,  Marin,  109,  131,  585. 
numbers,  109-110,  151,  278. 
primes,  109-110,  127,  507. 

Miles,  287,  296. 

Mills,  Stella,  593. 

Mills,  William  Harold,  594,  603. 

Minimum,  65,  237,  363. 

Mirsky,  Leon,  604. 

Mixture  of  probability  distributions,  414. 
Mobius,  August  Ferdinand,  136. 

function,  136-139,  357,  448-449,  501. 
mod:  binary  operation,  81-85. 
mod:  congruence  relation,  123-126. 
mod  0,  82-83,  500. 

Mode,  370,  371,  423. 

Modular  arithmetic,  123-129. 

Modulus,  82. 

Moessner,  Alfred,  594,  604. 

Moments,  384-385. 

Montgomery,  Peter  Lawrence,  594,  603. 
Moriarty,  James,  162. 

Morse,  Samuel  Finley  Breese,  code,  288,  310. 
Moser,  Leo,  594,  602. 

Motzkin,  Theodor  Samuel,  533,  539,  590, 

594. 

Mountain  ranges,  345,  541. 

Mozzochi,  Charles  Jeffrey,  594. 

Mu  function,  136-139,  357,  448-449,  501. 
Multinomial  coefficients,  168,  171-172,  240, 
545. 

Multiple  of  a number,  102. 

Multiple  sums,  34-41,  61. 

Multiple-precision  numbers,  127. 


INDEX  617 


Multiplicative  functions,  134-136,  357. 
Multisets,  77,  256. 

Mumble  function,  83,  84,  492,  499. 
Mumble-fractional  part,  88. 

Murdock,  Phoebe  James,  viii. 

Murphy’s  Law,  74. 

Myers,  Basil  Roland,  594,  604. 

•V  see  Nu  function. 
nth  difference,  267. 

Name  and  conquer,  2,  32,  88,  139. 

National  Science  Foundation,  ix. 

Natural  logarithm,  53-54,  262. 

Naval  Research,  ix. 

Navel  research,  285. 

Nearest  integer,  95. 

Necessary  and  sufficient  condition,  72. 
Necklaces,  139-141,  245. 

Negating  the  upper  index,  164-165. 
Negative  binomial  distribution,  388-389, 
414. 

Negative  factorial  powers,  52,  63,  188. 
Newman,  James  Roy,  600. 

Newman,  Morris,  604. 

Newton,  Sir  Isaac,  189,  263,  594. 
series,  189-192. 

Newtonian  generating  function,  364. 

Niven,  Ivan  Morton,  318,  594,  602. 
Nontransitive  paradox,  396. 

Normal  distribution,  424. 

Notation,  x-xi,  2,  21-25,  4849,  67-70,  73, 
81,  102,  111,  115,  123-124,  194,  243. 
extension  of,  49,  52,  154,  210-211,  252, 
257,  297. 
ghastly,  67,  175. 
need  for  new,  83,  115,  253. 

Nu  function,  12,  114,  146,  529. 

Null  case,  2,  306-307,  335,  541. 

Number  system,  107,  119. 
binomial,  234. 

Fibonacci,  282,  296,  303. 
prime -exponent,  107,  116. 


radix,  11,  16,  109,  146,  148,  195,  233,  446, 
511. 

residue,  126-129,  144. 

Stem-Brocot,  119-123,  146,  292,  504,  527. 
Number  theory,  102-152. 

o,  considered  harmful,  434-435. 

O-notation,  76,  429-435. 

Obvious,  clarified,  403,  511. 

Odds,  396. 

Odlyzko,  Andrew  Michael,  81,  540,  588,  605. 
Office  of  Naval  Research,  ix. 

One-way  equalities,  432-433. 

Open  interval,  73-74,  96. 

Operators,  47,  55,  219. 

Optical  illusions,  278,  279,  536. 

Organ-pipe  order,  509. 

71,  26,  70,  146,  232,  471,  540,  570. 
n-notation,  64,  106. 

Pacioli,  Luca,  586. 

Palais,  Richard  Sheldon,  viii. 

Paradoxes,  279,  396,  515. 

Paradoxical  sums,  57. 

Parallel  summation,  159,  174,  208-209. 
Parentheses,  343-345. 

Parenthesis  conventions,  xi. 

Partial  fraction  expansions,  64,  189,  284-285, 
324-327,  360,  362,  462,  490,  535. 

Partial  quotients,  292,  304,  540. 

Partial  sums,  48-49,  55-56,  161,  165-166, 
223-230,  233. 

required  to  be  positive,  345-348. 

Partition  into  nearly  equal  parts,  83-85. 
Partitions,  of  the  integers,  77-78,  99,  101. 
of  a number,  316. 
of  a set,  244-245. 

Pascal,  Blaise,  155,  156,  594,  602. 

Pascal’s  triangle,  155. 
extended  upward,  164. 
row  products,  231. 
row  sums,  163,  165. 
variant  of,  238. 

Patashnik,  Amy  Markowitz,  ix. 
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Patashnik,  Oren,  iii,  iv,  vi,  ix,  102,  492,  588, 
601. 

Patil,  Ganapati  Parashuram,  594,  605. 
Peirce,  Charles  Santiago  Sanders,  510,  595, 
603. 

sequence,  151. 

Penney,  Walter  Francis,  394,  595. 

Penney  ante,  394-396,  416,  423,  424. 
Pentagon,  300  (exercise  46),  416,  420. 
Pentagonal  numbers,  366. 

Percus,  Jerome  Kenneth,  595,  604. 

Perfect  powers,  66. 

Periodic  recurrences,  179. 

Permutations,  111-112,  193-196. 
ascents  in,  253-254,  256. 
up-down,  363. 

Personal  computer,  109. 

Perturbation  method,  32-33,  43-44,  64,  179, 
270-271. 

Pfaff,  Johann  F’riedrich,  207,  217,  595,  603. 

reflection  law,  217,  235. 
pgf:  Probability  generating  function, 

Phages,  420,  424. 

Phi  (=  Golden  ratio),  70,  97,  285-287,  296, 
530. 

Phi  function  (=  Totient  function),  133-135, 
137-144,  357,  448-449. 

Phidias,  285. 

Philosophy,  vii,  11,  16,  46,  71,  72,  75,  91, 

170,  181,  194,  317,  453,  489,  494,  577. 
Phyllotaxis,  277. 

Pi,  26,  70,  146,  232,  471,  540,  570. 

Pig,  Porky,  482. 

Pigeonhole  principle,  130. 

Pincherle,  Salvatore,  589. 

Pisano,  Leonardo,  585,  see  Fibonacci. 

Pittel,  Boris  Gershon,  552. 

Pizza,  4,  409. 

Planes,  cutting,  19. 

Pneumathics,  164. 

Pochhammer,  Leo,  48,  595. 
symbol,  48. 

Pocket  calculators,  67,  330. 


Poincare,  Jules  Henri,  595,  605. 

Poisson,  Simeon  Denis,  457,  595. 
distribution,  414,  554. 
summation  formula,  576. 

Poliak,  Henry  Otto,  588,  602. 

Polya,  George  (=  Gyorgy),  vi,  16,  313,  494, 
595,  602,  604,  605. 

Polygons,  20,  360,  365. 

Polynomial  argument,  158,  163,  210. 

Polynomially  recursive  sequence,  360. 

Polynomials,  189-191. 
degree  of,  158,  226. 
divisibility  of,  225. 
reflected,  325. 

Poonen,  Bjorn,  487,  595,  602. 

Porter,  Thomas  K,  601. 

Portland  cement,  see  Concrete  (in  another 
book). 

Power  series,  196,  see  Generating  functions, 
formal,  206,  317,  517. 

Pr,  367-368. 

Pratt,  Vaughan  Ronald,  601. 

Primality  testing,  110,  148. 

Prime  numbers,  23,  105-111,  442. 
largest  known,  109-110. 

Mersenne,  109-110,  127,  507. 
size  of  nth,  110-111,  442-443. 

Prime  to,  115. 

Prime-exponent  representation,  107,  116. 

Princeton  University,  ix,  413. 

Probabilistic  analysis  of  an  algorithm, 
399412. 

Probability,  195,  367-424. 

conditional,  402405,  410-411. 
discrete,  367-424. 
distribution,  367. 
generating  function,  380-387. 
space,  367. 

Product  of  consecutive  odd  numbers,  186, 
256. 

Product  notation,  64,  106. 

Progression,  arithmetic,  26,  30,  362. 
geometric,  32-33,  54,  114,  205-206. 
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Proof,  4,  7. 

Property,  23,  34. 

Pulling  out  the  large  part,  439,  444. 

Puns,  ix,  220. 

Pythagoras  of  Samos,  theorem,  495. 

Quadratic  domain,  147. 

Questions,  levels  of,  viii,  72-73,  95,  497. 
Quicksort,  28. 

Quotation  marks,  xi. 

Quotient,  81. 

SH:  Real  part,  64,  212,  437. 

Rabbits,  296. 

Radix  notation,  11,  16,  109,  146,  148,  195, 
233,  446,  511. 

Radix-2  representation,  11-13,  15,  70,  113. 
Rado,  Richard,  595,  604. 

Rainville,  Earl  David,  514,  595. 

Ramanujan  Aiyangar,  Srinivasa,  316. 
Ramshaw,  Lyle  Harold,  73,  601,  603,  605. 
Random  variables,  369-372. 

independent,  370,  413,  423. 

Raney,  George  Neal,  345,  348,  596,  604. 
lemma,  345-346. 
lemma,  generalized,  348,  358. 
sequences,  347. 

Rao,  D.  Rameswar,  596,  602. 

Rational  function,  207,  324. 

Rayleigh,  John  William  Strutt,  baron,  77, 
596. 

Real  part,  64,  212,  437. 

Reciprocity  law,  94. 

Recorde,  Robert,  432,  596. 

Recurrences,  1,  34,  6,  10,  13,  78-81,  103, 
159,  323. 

doubly  exponential,  97,  100,  101,  109. 

implicit,  136-138,  193-194,  270. 

periodic,  20,  179. 

solving,  323-336. 

and  sums,  25-29. 

unfolding,  6,  100,  159-160,  298. 

unfolding  asymptotically,  442. 

Referee,  175. 


Reference  books,  42,  223,  590. 

Reflected  light  rays,  277. 

Reflected  polynomial,  325. 

Reflection  law  for  hypergeometrics,  217,  235. 
Regions,  4-5,  17,  19. 

Reich,  Simeon,  596,  605. 

Relative  error,  438,  441. 

Relatively  prime  integers,  108,  115-123. 
Remainder  after  division,  81. 

Remainder  in  Euler’s  summation  formula, 
457,  460461,  465466. 

Renz,  Peter  Lewis,  viii. 

Repertoire  method,  15,  19,  26,  44-45,  63, 
238,  298,  300,  358. 

Replicative  function,  100. 

Residue  number  system,  126-129,  144. 
Retrieving  information,  397-399. 

Rewards,  monetary,  ix,  242,  483,  510,  550. 
Rham,  Georges  de,  596,  604. 

Ribenboim,  Paolo,  532,  596,  603. 

Rice,  Stephan  Oswald,  595. 

Rice  University,  ix. 

Riemann,  Georg  Friedrich  Bernhard,  205, 
596,  602. 
hypothesis,  511. 

zeta  function,  65,  263-264,  272,  356-357, 
449,  511,  542,  547,  569,  571,  575. 
Rising  factorial  powers,  48,  63,  211. 
related  to  falling  powers,  63,  298. 
related  to  ordinary  powers,  249,  572. 
Roberts,  Samuel,  596,  602. 

Rocky  road,  36. 

Rpdseth,  0ystein  Johan,  596,  603. 
Rolletschek,  Heinrich  Franz,  499. 

Roots  of  unity,  149,  204,  361,  530,  550,  572. 

modulo  m,  128-129. 

Rosser,  John  Barkley,  111,  596. 

Rota,  Gian-Carlo,  501,  596. 

Roulette  wheel,  74-75. 

Rounding,  unbiased,  492. 

Roy,  Ranjan,  596,  603. 

Rubber  band,  260-261,  264,  298,  479. 

Ruler  function,  113,  146,  148. 
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Running  time,  411-412. 

Ruzsa,  Imre  Zoltan,  584. 

cr,  374. 

t-notation,  22-25. 

Saalschiitz,  Louis,  596,  603. 
identity,  214. 

Sample  mean  and  variance,  377-379,  413. 
Samplesort,  340. 

Sandwiching,  157,  165. 

Sarkozy,  Andras,  526,  596. 

Sawyer,  Walter  Warwick,  207,  597. 
Schaffer,  Alejandro  Alberto,  601. 

Schinzel,  Andrzej,  510. 

Schlomilch,  Oscar  Xaver,  597. 

Schoenfeld,  Lowell,  111,  596. 

Schonheim,  Johanen,  581. 

Schroder,  Ernst,  597,  604. 

Schrodinger,  Erwin,  416. 

Schroter,  Heinrich  Eduard,  597,  604. 
Schiitzenberger,  Marcel  Paul,  605. 

Scorer,  Richard  Segar,  597,  602. 

Searching  a table,  397-399. 

Seaver,  George  Thomas  (=  41),  8,  21,  94, 
105,  106,  329. 

Second-order  Eulerian  numbers,  256-257. 
Second-order  Fibonacci  numbers,  361. 
Second-order  harmonic  numbers,  263,  266, 
297,  529. 

Sedge  wick,  Robert,  601. 

Sedlacek,  Jin,  597,  604. 

Self-certifying  algorithms,  104. 
Self-describing  sequence,  66,  481. 

Self  reference,  59,  515-524,  588,  620. 

Set  inclusion  in  O-notation,  432. 

Shallit,  Jeffrey  Outlaw,  597,  603. 
Sharkansky,  Stefan  Michael,  601. 

Sharp,  Robert  Thomas,  259,  597. 

Sherry,  419. 

Shift  operator,  55,  188,  191. 

Shiloach,  Joseph  (=  Yossi),  601. 

Shor,  Peter  Williston,  602. 

Sicherman,  George  Leprechaun,  605. 


Sideways  addition,  12,  114,  146,  238,  529. 
Sierpinski,  Waclaw,  87,  597,  603. 

Sieve  of  Erastothenes,  111. 

Sigma-notation,  22-25. 

Signum,  488. 

Silverman,  David  L,  597,  604. 

Skepticism,  7 1 . 

Skiena,  Steven  Sol,  526. 

Slater,  Lucy  Joan,  223,  597. 

Sloane,  Neil  James  Alexander,  42,  327,  578, 
597,  602. 

Small  cases,  2,  5,  9,  155,  306-307,  316. 

Smith,  Cedric  Austen  Bardell,  597,  602. 
Snowwalker,  Luke,  421. 

Solov’ev,  Aleksandr  Danilovitch,  394,  598. 
Solution,  3,  323. 

Sorting,  28,  79,  175,  340,  434. 

Spanning  trees,  334-336,  342,  354-355,  360. 
Spec,  77-78,  96,  97,  99,  101. 

Special  numbers,  243-305. 

Spectrum,  77-78,  96,  97,  99,  101,  293,  304. 
Spiral  function,  99. 

Spohn,  William  Gideon,  Jr.,  598. 

Sports,  see  Baseball,  Football,  Frisbees, 
Golf,  Tennis. 

Square  pyramidal  numbers,  42. 

Square  root,  of  1 (mod  m),  128-129. 
of  2,  100. 
of  3,  364. 

Squarefree,  145,  151,  359. 

Squares,  sum  of  consecutive,  41-46,  51,  180, 
233,  255,  270,  274,  353,  430,  456. 

Stack  size,  346-347. 

Stacking  cards,  259-260,  295. 

Stallman,  Richard  Matthew,  598. 

Standard  deviation,  374,  376-380. 

Stanford  University,  v,  vii,  ix,  413,  625. 
Stanley,  Richard  Peter,  256,  519,  587,  598, 
604,  605. 

Staudt,  Karl  Georg  Christian  von,  598,  604. 
Steele,  Guy  Lewis,  Jr.,  598. 

Stegun,  Irene  Anne,  42,  578. 

Stein,  Sherman  Kopald,  602. 
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Steiner,  Jacob,  5,  598,  602. 

Steinhaus,  Hugo  Dyonizy,  605. 

Stengel,  Charles  Dillon  (=  Casey),  42. 

Step  functions,  87. 

Stern,  Moriz  Abraham,  116,  598. 
Stern-Brocot  number  system,  119-123,  146, 
292,  504,  527. 

Stern-Brocot  tree,  116-123,  291-292,  364, 
510. 

Stern-Brocot  wreath,  500. 

Stewart,  Bonnie  Madison,  586,  602. 
Stickelberger,  Ludwig,  598,  602. 

Stieltjes,  Thomas  Jan,  589. 
constants,  569,  575. 

Stirling,  James,  192,  210,  243,  244,  283,  467, 
598. 

constant,  467,  471-475. 
formula,  112,  467-468,  477. 
formula,  perturbed,  440-441. 
numbers,  see  Stirling  numbers, 
polynomials,  257-258,  276,  297,  338-339. 

triangles,  244,  245,  253. 

Stirling  numbers,  243-253,  275-276,  478, 

577. 

combinatorial  interpretations,  244-248. 

convolution  formulas,  258,  276. 

of  the  first  kind,  245. 

generalized,  257-258,  302,  304,  572. 

generating  functions  for,  337. 

identities  for,  250-251,  258,  276,  303,  364. 

of  the  second  kind,  244. 

as  sums  of  products,  545. 

Stone,  Marshall  Harvey,  vi. 

Straus,  Ernst  Gabor,  539,  584,  594. 
Subfactorial,  194,  238. 

Summand,  22. 

Summation,  21-66. 
asymptotic,  87-89,  452-482. 

changing  the  index  of,  30-31,  39. 
definite,  49-50. 
difficulty  measure  for,  181. 
over  divisors,  104-105,  135-137,  141,  356. 
factor,  27-29,  64,  261. 


indefinite,  48-49,  55-56,  161,  224-230. 
infinite,  56-62,  64. 

interchanging  the  order  of,  34-41,  105, 

136,  183,  185. 

parallel,  159,  174,  208-209. 
by  parts,  54-56,  63,  265. 
over  triangular  arrays,  36-41. 
on  the  upper  index,  160-161,  176. 

Sums,  21-66. 

absolutely  convergent,  60-61,  64. 
approximation  of,  by  integral,  45,  262-263, 
455-461. 

of  consecutive  cubes,  51,  63,  269,  275,  353. 
of  consecutive  integers,  6,  44,  65. 
of  consecutive  mth  powers,  42,  269-271, 
274-276,  352-354. 

of  consecutive  squares,  41-46,  51,  180, 

255,  270,  274,  353,  430,  456. 
divergent,  60,  517. 
double,  34-41,  105,  237. 
doubly  infinite,  59,  98,  468-469. 
empty,  23,  48. 
floor/ceiling,  86-94. 
formal,  307,  317-318. 
of  harmonic  numbers,  41,  56,  265-268, 
298-299,  302,  340-341. 
hypergeometric,  see  Hypergeometric 
series. 

infinite,  56-62,  64. 
multiple,  34-41,  61. 
notations  for,  21-25. 
paradoxical,  57. 

partial,  48-49,  55-56,  161,  165-166, 

223-230,  233. 
and  recurrences,  25-29. 
tail  of,  452-455. 

Sun  Tsu,  126. 

Sunflower,  277. 

Super  generating  functions,  339,  407. 
Superfactorial,  149,  231. 

Swanson,  Ellen  Esther,  viii. 

Sweeney,  Dura  Warren,  598. 

Swinden,  B.A.,  602. 
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Sylvester,  James  Joseph,  598,  602. 

Symmetry  identities,  156,  254. 

Szegedy,  Mario,  510,  581,  599. 

Szegd,  Gabor,  595,  605. 

•0,  see  Theta  operator. 

0,  see  Big  Theta  notation. 

Tail  inequalities,  414,  416. 

Tail  of  a sum,  452-455. 

Tale  of  a sum,  see  Squares. 

Tangent  function,  273,  303. 

Tangent  numbers,  273. 

Tanner,  Jonathan  William,  131,  599. 

Tanny,  Stephen  Michael,  599,  604. 

Tartaglia,  Nicolb,  triangle,  155. 

Taylor,  Brook,  series,  163,  191,  382,  456-457. 
Telescoping,  50. 

Tennis,  418-419. 

Term,  21. 

Term  ratio,  207-209,  211-212. 

T^X,  219,  418,  625. 

Thackeray,  Henry  St.  John,  590. 

Theisinger,  Ludwig,  599,  603. 

Theory  of  numbers,  102-152. 

Theory  of  probability,  367-424. 

Theta  functions,  469,  509. 

Theta  operator,  219-221,  296. 

Thiele,  Thorvald  Nicolai,  383,  384,  599. 
Thinking,  489. 
big,  2,  427,  444,  469,  472. 
not  at  all,  56,  489. 

small,  see  Downward  generalization.  Small 
cases. 

Three-dots  (•■•)  notation,  21,  50,  108. 
Titchmarsh,  Edward  Charles,  599,  605. 

Todd,  H„  487. 

Tong,  Christopher  Hing,  601. 

Totient  function,  133-135,  1.37-144,  357, 
448-449. 

Toto,  556. 

Tournament,  418-419. 

Tower  of  Brahma,  1,  4,  264. 


Tower  of  Hanoi,  1-4,  26-27,  109,  146. 

variations  on,  17-19. 

Trabb  Pardo,  Luis  Isidoro,  601. 

Transitive  law,  124. 

failure  of,  396. 

Traps,  154,  157,  183,  222. 

Trees,  of  bees,  277. 
binary,  117. 

spanning,  334-336,  342,  354-355,  360. 
Stem-Brocot,  116-123,  291-292,  364,  510. 
Triangular  array,  summation  over,  36-41. 
Triangular  numbers,  6,  366. 

Tricomi,  Francesco  Giacomo  Filippo,  599, 
605. 
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